* [GIT PULL] perf/core improvements and fixes @ 2019-11-07 18:59 Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 01/63] perf data: Correctly identify directory data files Arnaldo Carvalho de Melo ` (62 more replies) 0 siblings, 63 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Andi Kleen, Haiyan Song, Ian Rogers, Igor Lubashev, James Clark, Jin Yao, Jiwei Sun, John Garry, Leo Yan, Masami Hiramatsu, Will Deacon, Yunfeng Ye, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit d44f821b0e13275735e8f3fe4db8703b45f05d52: perf/core: Optimize perf_init_event() for TYPE_SOFTWARE (2019-10-28 12:53:28 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191107 for you to fetch changes up to 7fa46cbf20d327d78114b1c8c7e69fabe7c57794: perf report: Sort by sampled cycles percent per block for tui (2019-11-07 10:14:48 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf report: Jin Yao: - Introduce --total-cycles, for basic block profiling, further using data obtained from LBR, an example should suffice: # perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ] # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Sampled Avg Avg # Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object # ....... ...... ....... ..... .................................... ................ # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux] perf record: Adrian Hunter: - Allow storing perf.data in a directory together with a copy of /proc/kcore. Jiwei Sun: - Add support for limit perf output file size, i.e.: # perf record --all-cpus -F 10000 --max-size=4M sleep 10h [ perf record: perf size limit reached (4097 KB), stopping session ] [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ] Terminated # ls -lah perf.data -rw-------. 1 root root 4.1M Nov 7 15:27 perf.data # perf stat: Jiri Olsa: - Add --per-node agregation support: In live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles ... perf probe: Masami Hiramatsu: Various fixes related to recent additions to the DWARF format: - Fix to find range-only function instance - Walk function lines in lexical blocks - Fix to show function entry line as probe-able - Fix wrong address verification - Fix to probe a function which has no entry pc - Fix to probe an inline function which has no entry pc - Fix to list probe event with correct line number - Fix to show inlined function callsite without entry_pc - Fix to show ranges of variables in functions without entry_pc - Return a better scope DIE if there is no best scope - Skip end-of-sequence and non statement lines - Filter out instances except for inlined subroutine and subprogram - Fix to show calling lines of inlined functions - Skip overlapped location on searching variables perf inject: Adrian Hunter: - Do not strip evsels with --strip, as they are needed for create_gcov (see the autofdo example in tools/perf/Documentation/intel-pt.txt). Intel PT: Adrian Hunter: - Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid repeated decoding. Add an auxtrace_cache__remove to handle text poke events. core: Andi Kleen: - Always preserve errno while cleaning up perf_event_open failures. llvm: Arnaldo Carvalho de Melo: - No need to tell that the request for saving a .o file for BPF events, as expressed in ~/.perfconfig was satisfied, make that a debug message. perf vendor events: Intel: Haiyan Song: - Update CascadelakeX events to v1.05. - Update all the Intel JSON metrics from TMAM 3.6. Treewide: Ian Rogers: - Improve error paths, plugging leaks found using LLVM tools such as libFuzzer. jevents: Yunfeng Ye: - Fix resource leak in process_mapfile() and main() perf kvm: Igor Lubashev: - Use evlist layer api when possible. libsubcmd: James Clark: - Move EXTRA_FLAGS to the end to allow overriding existing flags. - Use -O0 with DEBUG=1 perf diff: Jin Yao: - Don't use hack to skip column length calculation CoreSight ETM: Leo yan: - Fix definition of macro TO_CS_QUEUE_NR ARM64: John Garry: - Do not try to include libelf header files when its feature detection failed, fixing the cross build for ARM64. perf tests: Leo Yan: - Fix out of bounds memory access in the backward ring buffer test. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (9): perf data: Correctly identify directory data files perf data: Move perf_dir_version into data.h perf data: Rename directory "header" file to "data" perf data: Support single perf.data file directory perf record: Put a copy of kcore into the perf.data directory perf auxtrace: Add auxtrace_cache__remove() perf dso: Refactor dso_cache__read() perf dso: Add dso__data_write_cache_addr() perf inject: Make --strip keep evsels Andi Kleen (2): perf evsel: Always preserve errno while cleaning up perf_event_open failures perf evsel: Avoid close(-1) Arnaldo Carvalho de Melo (7): perf llvm: Make .o saving a debug message, not an info one perf map: Check if the map still has some refcounts on exit perf map: Allow map__next() to receive a NULL arg perf maps: Add for_each_entry()/_safe() iterators perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iterators perf symbols: Remove needless checks for map->groups->machine perf machine: Add kernel_dso() method Haiyan Song (2): perf vendor events intel: Update CascadelakeX events to v1.05 perf vendor events intel: Update all the Intel JSON metrics from TMAM 3.6. Ian Rogers (10): perf tools: Move ALLOC_LIST into a function perf tools: Avoid a malloc() for array events perf tools: Splice events onto evlist even on error perf parse: Add parse events handle error perf parse: Ensure config and str in terms are unique perf parse: Add destructors for parse event terms perf parse: Before yyabort-ing free components perf parse: If pmu configuration fails free terms perf parse: Add a deep delete for parse event terms perf annotate: Fix heap overflow Igor Lubashev (1): perf kvm: Use evlist layer api when possible James Clark (2): libsubcmd: Move EXTRA_FLAGS to the end to allow overriding existing flags libsubcmd: Use -O0 with DEBUG=1 Jin Yao (7): perf diff: Don't use hack to skip column length calculation perf block: Cleanup and refactor block info functions perf hist: Count the total cycles of all samples perf hist: Support block formats with compare/sort/display perf report: Sort by sampled cycles percent per block for stdio perf report: Support --percent-limit for --total-cycles perf report: Sort by sampled cycles percent per block for tui Jiri Olsa (3): perf session: Fix indent in perf_session__new()" perf env: Add perf_env__numa_node() perf stat: Add --per-node agregation support Jiwei Sun (1): perf record: Add support for limit perf output file size John Garry (1): perf tools: Fix cross compile for ARM64 Leo Yan (3): perf cs-etm: Fix definition of macro TO_CS_QUEUE_NR perf tests: Fix a typo perf tests: Fix out of bounds memory access Masami Hiramatsu (14): perf probe: Fix to find range-only function instance perf probe: Walk function lines in lexical blocks perf probe: Fix to show function entry line as probe-able perf probe: Fix wrong address verification perf probe: Fix to probe a function which has no entry pc perf probe: Fix to probe an inline function which has no entry pc perf probe: Fix to list probe event with correct line number perf probe: Fix to show inlined function callsite without entry_pc perf probe: Fix to show ranges of variables in functions without entry_pc perf probe: Return a better scope DIE if there is no best scope perf probe: Skip end-of-sequence and non statement lines perf probe: Filter out instances except for inlined subroutine and subprogram perf probe: Fix to show calling lines of inlined functions perf probe: Skip overlapped location on searching variables Yunfeng Ye (1): perf jevents: Fix resource leak in process_mapfile() and main() tools/lib/subcmd/Makefile | 9 +- tools/perf/Documentation/perf-record.txt | 7 + tools/perf/Documentation/perf-report.txt | 11 + tools/perf/Documentation/perf-stat.txt | 5 + .../Documentation/perf.data-directory-format.txt | 63 + tools/perf/arch/arm64/util/sym-handling.c | 3 +- tools/perf/arch/x86/util/event.c | 2 +- tools/perf/builtin-annotate.c | 2 +- tools/perf/builtin-diff.c | 121 +- tools/perf/builtin-inject.c | 54 - tools/perf/builtin-kvm.c | 2 +- tools/perf/builtin-record.c | 100 +- tools/perf/builtin-report.c | 67 +- tools/perf/builtin-stat.c | 52 + tools/perf/builtin-top.c | 3 +- tools/perf/lib/evsel.c | 3 +- .../pmu-events/arch/x86/broadwell/bdw-metrics.json | 178 +- .../arch/x86/broadwellx/bdx-metrics.json | 184 +- .../pmu-events/arch/x86/cascadelakex/cache.json | 12068 +++++++++---------- .../arch/x86/cascadelakex/clx-metrics.json | 210 +- .../arch/x86/cascadelakex/floating-point.json | 92 +- .../pmu-events/arch/x86/cascadelakex/frontend.json | 656 +- .../pmu-events/arch/x86/cascadelakex/memory.json | 11408 +++++++++--------- .../pmu-events/arch/x86/cascadelakex/other.json | 9620 +++++++-------- .../pmu-events/arch/x86/cascadelakex/pipeline.json | 1234 +- .../arch/x86/cascadelakex/uncore-memory.json | 191 + .../arch/x86/cascadelakex/uncore-other.json | 1585 ++- .../arch/x86/cascadelakex/virtual-memory.json | 339 +- .../pmu-events/arch/x86/haswell/hsw-metrics.json | 164 +- .../pmu-events/arch/x86/haswellx/hsx-metrics.json | 170 +- .../pmu-events/arch/x86/ivybridge/ivb-metrics.json | 170 +- .../pmu-events/arch/x86/ivytown/ivt-metrics.json | 172 +- .../pmu-events/arch/x86/jaketown/jkt-metrics.json | 114 +- .../arch/x86/sandybridge/snb-metrics.json | 112 +- .../pmu-events/arch/x86/skylake/skl-metrics.json | 188 +- .../pmu-events/arch/x86/skylakex/skx-metrics.json | 204 +- tools/perf/pmu-events/jevents.c | 13 +- tools/perf/tests/backward-ring-buffer.c | 9 + tools/perf/tests/bp_signal.c | 2 +- tools/perf/tests/map_groups.c | 9 +- tools/perf/tests/vmlinux-kallsyms.c | 6 +- tools/perf/ui/browsers/hists.c | 7 +- tools/perf/ui/browsers/hists.h | 2 + tools/perf/ui/stdio/hist.c | 29 +- tools/perf/util/Build | 1 + tools/perf/util/annotate.c | 2 +- tools/perf/util/auxtrace.c | 28 + tools/perf/util/auxtrace.h | 1 + tools/perf/util/block-info.c | 538 + tools/perf/util/block-info.h | 78 + tools/perf/util/cpumap.c | 18 + tools/perf/util/cpumap.h | 3 + tools/perf/util/cs-etm.c | 4 +- tools/perf/util/data.c | 46 +- tools/perf/util/data.h | 12 + tools/perf/util/dso.c | 135 +- tools/perf/util/dso.h | 7 + tools/perf/util/dwarf-aux.c | 80 +- tools/perf/util/dwarf-aux.h | 3 + tools/perf/util/env.c | 40 + tools/perf/util/env.h | 6 + tools/perf/util/evsel.c | 9 +- tools/perf/util/header.h | 4 - tools/perf/util/hist.c | 13 +- tools/perf/util/hist.h | 3 +- tools/perf/util/llvm-utils.c | 5 +- tools/perf/util/machine.c | 12 +- tools/perf/util/map.c | 65 +- tools/perf/util/map_groups.h | 24 +- tools/perf/util/parse-events.c | 175 +- tools/perf/util/parse-events.h | 3 + tools/perf/util/parse-events.y | 390 +- tools/perf/util/pmu.c | 32 +- tools/perf/util/probe-event.c | 2 +- tools/perf/util/probe-finder.c | 77 +- tools/perf/util/record.h | 1 + tools/perf/util/session.c | 8 +- tools/perf/util/stat-display.c | 15 + tools/perf/util/stat.c | 1 + tools/perf/util/stat.h | 1 + tools/perf/util/symbol.c | 64 +- tools/perf/util/symbol.h | 24 - tools/perf/util/symbol_conf.h | 1 + tools/perf/util/synthetic-events.c | 2 +- tools/perf/util/thread.c | 2 +- tools/perf/util/util.c | 19 +- tools/perf/util/vdso.c | 4 +- 87 files changed, 22145 insertions(+), 19453 deletions(-) create mode 100644 tools/perf/Documentation/perf.data-directory-format.txt create mode 100644 tools/perf/util/block-info.c create mode 100644 tools/perf/util/block-info.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. Manjaro is failing due to some missing library related to bison, looks like a distro bug. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.4.0-rc5.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 17 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20191101 gcc-9-branch@277702, clang version 9.0.0 (tags/RELEASE_900/final) 18 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 19 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 20 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 21 debian:experimental : Ok gcc (Debian 9.2.1-9) 9.2.1 20191008, clang version 8.0.1-3+b1 (tags/RELEASE_801/final) 22 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 24 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-19) 8.3.0 25 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 26 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 27 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 28 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 29 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 30 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 31 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 32 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 33 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 34 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 35 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 36 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 37 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 39 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 40 fedora:32 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 41 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 42 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 43 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 44 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 45 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 46 manjaro:latest : FAIL gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 47 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 48 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 49 opensuse:15.2 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 7.0.1 (tags/RELEASE_701/final 349238) 50 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 51 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 9.0.0 (tags/RELEASE_900/final 372316) 52 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 53 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 54 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 55 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 56 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 57 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 58 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 63 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 65 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 76 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 77 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 78 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 79 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 80 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 7fa46cbf20d3 perf report: Sort by sampled cycles percent per block for tui # perf version --build-options perf version 5.4.rc5.g7fa46cbf20d3 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_help_O: make help make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_slang_O: make NO_SLANG=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_gtk2_O: make NO_GTK2=1 make_no_libbpf_O: make NO_LIBBPF=1 make_doc_O: make doc make_install_O: make install make_install_prefix_O: make install prefix=/tmp/krava make_debug_O: make DEBUG=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_util_map_o_O: make util/map.o make_no_libnuma_O: make NO_LIBNUMA=1 make_tags_O: make tags make_no_libperl_O: make NO_LIBPERL=1 make_install_bin_O: make install-bin make_cscope_O: make cscope make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_libelf_O: make NO_LIBELF=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_newt_O: make NO_NEWT=1 make_clean_all_O: make clean all make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_perf_o_O: make perf.o make_pure_O: make make_no_demangle_O: make NO_DEMANGLE=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* [PATCH 01/63] perf data: Correctly identify directory data files 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 02/63] perf data: Move perf_dir_version into data.h Arnaldo Carvalho de Melo ` (61 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> In order to rename the "header" file to "data" without conflicting, correctly identify the non-header files as starting with "data." Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191004083121.12182-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/data.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index 88fba2ba549f..8993253c5564 100644 --- a/tools/perf/util/data.c +++ b/tools/perf/util/data.c @@ -96,7 +96,7 @@ int perf_data__open_dir(struct perf_data *data) if (stat(path, &st)) continue; - if (!S_ISREG(st.st_mode) || strncmp(dent->d_name, "data", 4)) + if (!S_ISREG(st.st_mode) || strncmp(dent->d_name, "data.", 5)) continue; ret = -ENOMEM; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 02/63] perf data: Move perf_dir_version into data.h 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 01/63] perf data: Correctly identify directory data files Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 03/63] perf data: Rename directory "header" file to "data" Arnaldo Carvalho de Melo ` (60 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> perf_dir_version belongs to struct perf_data which is declared in data.h. To allow its use in inline perf_data functions, move perf_dir_version to data.h Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191004083121.12182-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/data.h | 4 ++++ tools/perf/util/header.h | 4 ---- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h index 259868a39019..218fe9a16801 100644 --- a/tools/perf/util/data.h +++ b/tools/perf/util/data.h @@ -9,6 +9,10 @@ enum perf_data_mode { PERF_DATA_MODE_READ, }; +enum perf_dir_version { + PERF_DIR_VERSION = 1, +}; + struct perf_data_file { char *path; int fd; diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h index ca53a929e9fd..840f95cee349 100644 --- a/tools/perf/util/header.h +++ b/tools/perf/util/header.h @@ -52,10 +52,6 @@ enum perf_header_version { PERF_HEADER_VERSION_2, }; -enum perf_dir_version { - PERF_DIR_VERSION = 1, -}; - struct perf_file_section { u64 offset; u64 size; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 03/63] perf data: Rename directory "header" file to "data" 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 01/63] perf data: Correctly identify directory data files Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 02/63] perf data: Move perf_dir_version into data.h Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 04/63] perf session: Fix indent in perf_session__new()" Arnaldo Carvalho de Melo ` (59 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> In preparation to support a single file directory format, rename "header" to "data" because "header" is a mis-leading name when there is only 1 file. Note, in the multi-file case, the "header" file also contains data. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191004083121.12182-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/data.c | 2 +- tools/perf/util/util.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index 8993253c5564..df173f0bf654 100644 --- a/tools/perf/util/data.c +++ b/tools/perf/util/data.c @@ -306,7 +306,7 @@ static int open_dir(struct perf_data *data) * So far we open only the header, so we can read the data version and * layout. */ - if (asprintf(&data->file.path, "%s/header", data->path) < 0) + if (asprintf(&data->file.path, "%s/data", data->path) < 0) return -1; if (perf_data__is_write(data) && diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index ae56c766eda1..3096654377c2 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -185,7 +185,7 @@ static int rm_rf_depth_pat(const char *path, int depth, const char **pat) int rm_rf_perf_data(const char *path) { const char *pat[] = { - "header", + "data", "data.*", NULL, }; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 04/63] perf session: Fix indent in perf_session__new()" 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (2 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 03/63] perf data: Rename directory "header" file to "data" Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 05/63] perf data: Support single perf.data file directory Arnaldo Carvalho de Melo ` (58 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jiri Olsa, Adrian Hunter, Arnaldo Carvalho de Melo From: Jiri Olsa <jolsa@redhat.com> Fix up indentation. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lore.kernel.org/lkml/20191007112027.GD6919@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/session.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 6cc32f5ec043..0266604b8bc2 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -227,8 +227,8 @@ struct perf_session *perf_session__new(struct perf_data *data, /* Open the directory data. */ if (data->is_dir) { ret = perf_data__open_dir(data); - if (ret) - goto out_delete; + if (ret) + goto out_delete; } } } else { -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 05/63] perf data: Support single perf.data file directory 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (3 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 04/63] perf session: Fix indent in perf_session__new()" Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 06/63] perf record: Put a copy of kcore into the perf.data directory Arnaldo Carvalho de Melo ` (57 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> Support directory output that contains a regular perf.data file, named "data". By default the directory is named perf.data i.e. perf.data └── data Most of the infrastructure to support a directory is already there. This patch makes the changes needed to support the format above. Presently there is no 'perf record' option to output a directory. This is preparation for adding support for putting a copy of /proc/kcore in the directory. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191004083121.12182-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- .../perf.data-directory-format.txt | 28 +++++++++++++++++++ tools/perf/builtin-record.c | 2 +- tools/perf/util/data.c | 9 +++++- tools/perf/util/data.h | 6 ++++ 4 files changed, 43 insertions(+), 2 deletions(-) create mode 100644 tools/perf/Documentation/perf.data-directory-format.txt diff --git a/tools/perf/Documentation/perf.data-directory-format.txt b/tools/perf/Documentation/perf.data-directory-format.txt new file mode 100644 index 000000000000..4bf08908178d --- /dev/null +++ b/tools/perf/Documentation/perf.data-directory-format.txt @@ -0,0 +1,28 @@ +perf.data directory format + +DISCLAIMER This is not ABI yet and is subject to possible change + in following versions of perf. We will remove this + disclaimer once the directory format soaks in. + + +This document describes the on-disk perf.data directory format. + +The layout is described by HEADER_DIR_FORMAT feature. +Currently it holds only version number (0): + + HEADER_DIR_FORMAT = 24 + + struct { + uint64_t version; + } + +The current only version value 0 means that: + - there is a single perf.data file named 'data' within the directory. + e.g. + + $ tree -ps perf.data + perf.data + └── [-rw------- 25912] data + +Future versions are expected to describe different data files +layout according to special needs. diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 2fb83aabbef5..e402459752e7 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -537,7 +537,7 @@ static int record__process_auxtrace(struct perf_tool *tool, size_t padding; u8 pad[8] = {0}; - if (!perf_data__is_pipe(data) && !perf_data__is_dir(data)) { + if (!perf_data__is_pipe(data) && perf_data__is_single_file(data)) { off_t file_offset; int fd = perf_data__fd(data); int err; diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index df173f0bf654..964ea101dba6 100644 --- a/tools/perf/util/data.c +++ b/tools/perf/util/data.c @@ -76,6 +76,13 @@ int perf_data__open_dir(struct perf_data *data) DIR *dir; int nr = 0; + /* + * Directory containing a single regular perf data file which is already + * open, means there is nothing more to do here. + */ + if (perf_data__is_single_file(data)) + return 0; + if (WARN_ON(!data->is_dir)) return -EINVAL; @@ -406,7 +413,7 @@ unsigned long perf_data__size(struct perf_data *data) u64 size = data->file.size; int i; - if (!data->is_dir) + if (perf_data__is_single_file(data)) return size; for (i = 0; i < data->dir.nr; i++) { diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h index 218fe9a16801..f68815f7e428 100644 --- a/tools/perf/util/data.h +++ b/tools/perf/util/data.h @@ -10,6 +10,7 @@ enum perf_data_mode { }; enum perf_dir_version { + PERF_DIR_SINGLE_FILE = 0, PERF_DIR_VERSION = 1, }; @@ -54,6 +55,11 @@ static inline bool perf_data__is_dir(struct perf_data *data) return data->is_dir; } +static inline bool perf_data__is_single_file(struct perf_data *data) +{ + return data->dir.version == PERF_DIR_SINGLE_FILE; +} + static inline int perf_data__fd(struct perf_data *data) { return data->file.fd; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 06/63] perf record: Put a copy of kcore into the perf.data directory 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (4 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 05/63] perf data: Support single perf.data file directory Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 07/63] perf llvm: Make .o saving a debug message, not an info one Arnaldo Carvalho de Melo ` (56 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> Add a new 'perf record' option '--kcore' which will put a copy of /proc/kcore, kallsyms and modules into a perf.data directory. Note, that without the --kcore option, output goes to a file as previously. The tools' -o and -i options work with either a file name or directory name. Example: $ sudo perf record --kcore uname $ sudo tree perf.data perf.data ├── kcore_dir │ ├── kallsyms │ ├── kcore │ └── modules └── data $ sudo perf script -v build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541 Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field. Using CPUID GenuineIntel-6-8E-A Using perf.data/kcore_dir/kcore for kernel data Using perf.data/kcore_dir/kallsyms for symbols perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux) perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux) perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux) uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux) uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux) uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux) Committer testing: # rm -rf perf.data* # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ] # ls -l perf.data -rw-------. 1 root root 34772 Oct 21 11:08 perf.data # perf record --kcore uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ] ls[root@quaco ~]# ls -lad perf.data* drwx------. 3 root root 4096 Oct 21 11:08 perf.data -rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # perf evlist -v -i perf.data/data cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/Documentation/perf-record.txt | 3 ++ .../perf.data-directory-format.txt | 35 +++++++++++++ tools/perf/builtin-record.c | 52 +++++++++++++++++++ tools/perf/util/data.c | 33 ++++++++++++ tools/perf/util/data.h | 2 + tools/perf/util/record.h | 1 + tools/perf/util/session.c | 4 ++ tools/perf/util/util.c | 17 ++++++ 8 files changed, 147 insertions(+) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index c6f9f31b6039..8a4506113d9f 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -571,6 +571,9 @@ config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'. Implies --tail-synthesize. +--kcore:: +Make a copy of /proc/kcore and place it into a directory with the perf data file. + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1] diff --git a/tools/perf/Documentation/perf.data-directory-format.txt b/tools/perf/Documentation/perf.data-directory-format.txt index 4bf08908178d..f37fbd29112e 100644 --- a/tools/perf/Documentation/perf.data-directory-format.txt +++ b/tools/perf/Documentation/perf.data-directory-format.txt @@ -26,3 +26,38 @@ The current only version value 0 means that: Future versions are expected to describe different data files layout according to special needs. + +Currently the only 'perf record' option to output to a directory is +the --kcore option which puts a copy of /proc/kcore into the directory. +e.g. + + $ sudo perf record --kcore uname + Linux + [ perf record: Woken up 1 times to write data ] + [ perf record: Captured and wrote 0.015 MB perf.data (9 samples) ] + $ sudo tree -ps perf.data + perf.data + ├── [-rw------- 23744] data + └── [drwx------ 4096] kcore_dir + ├── [-r-------- 6731125] kallsyms + ├── [-r-------- 40230912] kcore + └── [-r-------- 5419] modules + + 1 directory, 4 files + $ sudo perf script -v + build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de + build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541 + build id event received for /lib/x86_64-linux-gnu/libc-2.28.so: 5b157f49586a3ca84d55837f97ff466767dd3445 + Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field. + Using CPUID GenuineIntel-6-8E-A + Using perf.data/kcore_dir/kcore for kernel data + Using perf.data/kcore_dir/kallsyms for symbols + perf 15316 2060795.480902: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) + perf 15316 2060795.480906: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) + perf 15316 2060795.480908: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) + perf 15316 2060795.480910: 119 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux) + perf 15316 2060795.480912: 2109 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux) + perf 15316 2060795.480914: 37606 cycles: ffffffffa2f121fe perf_event_addr_filters_exec+0x2e (vmlinux) + uname 15316 2060795.480924: 588287 cycles: ffffffffa303a56d page_counter_try_charge+0x6d (vmlinux) + uname 15316 2060795.481067: 2261945 cycles: ffffffffa301438f kmem_cache_free+0x4f (vmlinux) + uname 15316 2060795.481643: 2172167 cycles: 7f1a48c393c0 _IO_un_link+0x0 (/lib/x86_64-linux-gnu/libc-2.28.so) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index e402459752e7..f6664bb08b26 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -55,6 +55,9 @@ #include <signal.h> #include <sys/mman.h> #include <sys/wait.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> #include <linux/err.h> #include <linux/string.h> #include <linux/time64.h> @@ -699,6 +702,37 @@ static int record__auxtrace_init(struct record *rec __maybe_unused) #endif +static bool record__kcore_readable(struct machine *machine) +{ + char kcore[PATH_MAX]; + int fd; + + scnprintf(kcore, sizeof(kcore), "%s/proc/kcore", machine->root_dir); + + fd = open(kcore, O_RDONLY); + if (fd < 0) + return false; + + close(fd); + + return true; +} + +static int record__kcore_copy(struct machine *machine, struct perf_data *data) +{ + char from_dir[PATH_MAX]; + char kcore_dir[PATH_MAX]; + int ret; + + snprintf(from_dir, sizeof(from_dir), "%s/proc", machine->root_dir); + + ret = perf_data__make_kcore_dir(data, kcore_dir, sizeof(kcore_dir)); + if (ret) + return ret; + + return kcore_copy(from_dir, kcore_dir); +} + static int record__mmap_evlist(struct record *rec, struct evlist *evlist) { @@ -1383,6 +1417,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) session->header.env.comp_type = PERF_COMP_ZSTD; session->header.env.comp_level = rec->opts.comp_level; + if (rec->opts.kcore && + !record__kcore_readable(&session->machines.host)) { + pr_err("ERROR: kcore is not readable.\n"); + return -1; + } + record__init_features(rec); if (rec->opts.use_clockid && rec->opts.clockid_res_ns) @@ -1414,6 +1454,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) } session->header.env.comp_mmap_len = session->evlist->core.mmap_len; + if (rec->opts.kcore) { + err = record__kcore_copy(&session->machines.host, data); + if (err) { + pr_err("ERROR: Failed to copy kcore\n"); + goto out_child; + } + } + err = bpf__apply_obj_config(); if (err) { char errbuf[BUFSIZ]; @@ -2184,6 +2232,7 @@ static struct option __record_options[] = { parse_cgroups), OPT_UINTEGER('D', "delay", &record.opts.initial_delay, "ms to wait before starting measurement after program start"), + OPT_BOOLEAN(0, "kcore", &record.opts.kcore, "copy /proc/kcore"), OPT_STRING('u', "uid", &record.opts.target.uid_str, "user", "user to profile"), @@ -2322,6 +2371,9 @@ int cmd_record(int argc, const char **argv) } + if (rec->opts.kcore) + rec->data.is_dir = true; + if (rec->opts.comp_level != 0) { pr_debug("Compression enabled, disabling build id collection at the end of the session.\n"); rec->no_buildid = true; diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index 964ea101dba6..c47aa34fdc0a 100644 --- a/tools/perf/util/data.c +++ b/tools/perf/util/data.c @@ -424,3 +424,36 @@ unsigned long perf_data__size(struct perf_data *data) return size; } + +int perf_data__make_kcore_dir(struct perf_data *data, char *buf, size_t buf_sz) +{ + int ret; + + if (!data->is_dir) + return -1; + + ret = snprintf(buf, buf_sz, "%s/kcore_dir", data->path); + if (ret < 0 || (size_t)ret >= buf_sz) + return -1; + + return mkdir(buf, S_IRWXU); +} + +char *perf_data__kallsyms_name(struct perf_data *data) +{ + char *kallsyms_name; + struct stat st; + + if (!data->is_dir) + return NULL; + + if (asprintf(&kallsyms_name, "%s/kcore_dir/kallsyms", data->path) < 0) + return NULL; + + if (stat(kallsyms_name, &st)) { + free(kallsyms_name); + return NULL; + } + + return kallsyms_name; +} diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h index f68815f7e428..75947ef6bc17 100644 --- a/tools/perf/util/data.h +++ b/tools/perf/util/data.h @@ -87,4 +87,6 @@ int perf_data__open_dir(struct perf_data *data); void perf_data__close_dir(struct perf_data *data); int perf_data__update_dir(struct perf_data *data); unsigned long perf_data__size(struct perf_data *data); +int perf_data__make_kcore_dir(struct perf_data *data, char *buf, size_t buf_sz); +char *perf_data__kallsyms_name(struct perf_data *data); #endif /* __PERF_DATA_H */ diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index 00275afc524d..948bbcf9aef3 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -44,6 +44,7 @@ struct record_opts { bool strict_freq; bool sample_id; bool no_bpf_event; + bool kcore; unsigned int freq; unsigned int mmap_pages; unsigned int auxtrace_mmap_pages; diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 0266604b8bc2..f07b8ecb91bc 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -230,6 +230,10 @@ struct perf_session *perf_session__new(struct perf_data *data, if (ret) goto out_delete; } + + if (!symbol_conf.kallsyms_name && + !symbol_conf.vmlinux_name) + symbol_conf.kallsyms_name = perf_data__kallsyms_name(data); } } else { session->machines.host.env = &perf_env; diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index 3096654377c2..969ae560dad9 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -182,6 +182,21 @@ static int rm_rf_depth_pat(const char *path, int depth, const char **pat) return rmdir(path); } +static int rm_rf_kcore_dir(const char *path) +{ + char kcore_dir_path[PATH_MAX]; + const char *pat[] = { + "kcore", + "kallsyms", + "modules", + NULL, + }; + + snprintf(kcore_dir_path, sizeof(kcore_dir_path), "%s/kcore_dir", path); + + return rm_rf_depth_pat(kcore_dir_path, 0, pat); +} + int rm_rf_perf_data(const char *path) { const char *pat[] = { @@ -190,6 +205,8 @@ int rm_rf_perf_data(const char *path) NULL, }; + rm_rf_kcore_dir(path); + return rm_rf_depth_pat(path, 0, pat); } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 07/63] perf llvm: Make .o saving a debug message, not an info one 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (5 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 06/63] perf record: Put a copy of kcore into the perf.data directory Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 08/63] perf cs-etm: Fix definition of macro TO_CS_QUEUE_NR Arnaldo Carvalho de Melo ` (55 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Daniel Bristot de Oliveira, David Ahern, Luis Cláudio Gonçalves From: Arnaldo Carvalho de Melo <acme@redhat.com> Its a bit annoying to have that message, better make it a debug one. I.e. now this message will only appear when using '-v': [root@quaco tracebuffer]# trace -e bristot.c LLVM: dumping bristot.o ^C[root@quaco tracebuffer]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-o7jd4i7s66kosec5torubqps@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/llvm-utils.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c index 8b14e4a7f1dc..eae47c2509eb 100644 --- a/tools/perf/util/llvm-utils.c +++ b/tools/perf/util/llvm-utils.c @@ -418,10 +418,9 @@ void llvm__dump_obj(const char *path, void *obj_buf, size_t size) goto out; } - pr_info("LLVM: dumping %s\n", obj_path); + pr_debug("LLVM: dumping %s\n", obj_path); if (fwrite(obj_buf, size, 1, fp) != 1) - pr_warning("WARNING: failed to write to file '%s': %s, skip object dumping\n", - obj_path, strerror(errno)); + pr_debug("WARNING: failed to write to file '%s': %s, skip object dumping\n", obj_path, strerror(errno)); fclose(fp); out: free(obj_path); -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 08/63] perf cs-etm: Fix definition of macro TO_CS_QUEUE_NR 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (6 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 07/63] perf llvm: Make .o saving a debug message, not an info one Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 09/63] perf evsel: Always preserve errno while cleaning up perf_event_open failures Arnaldo Carvalho de Melo ` (54 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Leo Yan, Mathieu Poirier, Alexander Shishkin, Jiri Olsa, Mark Rutland, Peter Zijlstra, Suzuki Poulouse, coresight ml, linux-arm-kernel, Arnaldo Carvalho de Melo From: Leo Yan <leo.yan@linaro.org> Macro TO_CS_QUEUE_NR definition has a typo, which uses 'trace_id_chan' as its parameter, this doesn't match with its definition body which uses 'trace_chan_id'. So renames the parameter to 'trace_chan_id'. It's luck to have a local variable 'trace_chan_id' in the function cs_etm__setup_queue(), even we wrongly define the macro TO_CS_QUEUE_NR, the local variable 'trace_chan_id' is used rather than the macro's parameter 'trace_id_chan'; so the compiler doesn't complain for this before. After renaming the parameter, it leads to a compiling error due cs_etm__setup_queue() has no variable 'trace_id_chan'. This patch uses the variable 'trace_chan_id' for the macro so that fixes the compiling error. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20191021074808.25795-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/cs-etm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 4ba0f871f086..f5f855fff412 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -110,7 +110,7 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq); * encode the etm queue number as the upper 16 bit and the channel as * the lower 16 bit. */ -#define TO_CS_QUEUE_NR(queue_nr, trace_id_chan) \ +#define TO_CS_QUEUE_NR(queue_nr, trace_chan_id) \ (queue_nr << 16 | trace_chan_id) #define TO_QUEUE_NR(cs_queue_nr) (cs_queue_nr >> 16) #define TO_TRACE_CHAN_ID(cs_queue_nr) (cs_queue_nr & 0x0000ffff) @@ -819,7 +819,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, * Note that packets decoded above are still in the traceID's packet * queue and will be processed in cs_etm__process_queues(). */ - cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_id_chan); + cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_chan_id); ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp); out: return ret; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 09/63] perf evsel: Always preserve errno while cleaning up perf_event_open failures 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (7 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 08/63] perf cs-etm: Fix definition of macro TO_CS_QUEUE_NR Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 10/63] perf evsel: Avoid close(-1) Arnaldo Carvalho de Melo ` (53 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Andi Kleen, Kan Liang, Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo From: Andi Kleen <ak@linux.intel.com> In some cases when perf_event_open fails, it may do some closes to clean up. In special cases these closes can fail too, which overwrites the errno of the perf_event_open, which is then incorrectly reported. Save/restore errno around closes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20191020175202.32456-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/evsel.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index abc7fda4a0fe..d831038b55f2 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1574,7 +1574,7 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, { int cpu, thread, nthreads; unsigned long flags = PERF_FLAG_FD_CLOEXEC; - int pid = -1, err; + int pid = -1, err, old_errno; enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE; if ((perf_missing_features.write_backward && evsel->core.attr.write_backward) || @@ -1727,8 +1727,8 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, */ if (err == -EMFILE && set_rlimit < INCREASED_MAX) { struct rlimit l; - int old_errno = errno; + old_errno = errno; if (getrlimit(RLIMIT_NOFILE, &l) == 0) { if (set_rlimit == NO_CHANGE) l.rlim_cur = l.rlim_max; @@ -1812,6 +1812,7 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, if (err) threads->err_thread = thread; + old_errno = errno; do { while (--thread >= 0) { close(FD(evsel, cpu, thread)); @@ -1819,6 +1820,7 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, } thread = nthreads; } while (--cpu >= 0); + errno = old_errno; return err; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 10/63] perf evsel: Avoid close(-1) 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (8 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 09/63] perf evsel: Always preserve errno while cleaning up perf_event_open failures Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 11/63] perf tools: Move ALLOC_LIST into a function Arnaldo Carvalho de Melo ` (52 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Andi Kleen, Kan Liang, Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo From: Andi Kleen <ak@linux.intel.com> In some weak fallback cases close can be called a lot with -1. Check for this case and avoid calling close then. This is mainly to shut up valgrind which complains about this case. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20191020175202.32456-3-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/lib/evsel.c | 3 ++- tools/perf/util/evsel.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c index a8cb582e2721..5a89857b0381 100644 --- a/tools/perf/lib/evsel.c +++ b/tools/perf/lib/evsel.c @@ -120,7 +120,8 @@ void perf_evsel__close_fd(struct perf_evsel *evsel) for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { - close(FD(evsel, cpu, thread)); + if (FD(evsel, cpu, thread) >= 0) + close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } } diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index d831038b55f2..d4451846af93 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1815,7 +1815,8 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, old_errno = errno; do { while (--thread >= 0) { - close(FD(evsel, cpu, thread)); + if (FD(evsel, cpu, thread) >= 0) + close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } thread = nthreads; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 11/63] perf tools: Move ALLOC_LIST into a function 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (9 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 10/63] perf evsel: Avoid close(-1) Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 12/63] perf tools: Avoid a malloc() for array events Arnaldo Carvalho de Melo ` (51 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> Having a YYABORT in a macro makes it hard to free memory for components of a rule. Separate the logic out. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191023005337.196160-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.y | 65 ++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 22 deletions(-) diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 48126ae4cd13..5863acb34780 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -25,12 +25,17 @@ do { \ YYABORT; \ } while (0) -#define ALLOC_LIST(list) \ -do { \ - list = malloc(sizeof(*list)); \ - ABORT_ON(!list); \ - INIT_LIST_HEAD(list); \ -} while (0) +static struct list_head* alloc_list() +{ + struct list_head *list; + + list = malloc(sizeof(*list)); + if (!list) + return NULL; + + INIT_LIST_HEAD(list); + return list; +} static void inc_group_count(struct list_head *list, struct parse_events_state *parse_state) @@ -238,7 +243,8 @@ PE_NAME opt_pmu_config if (error) error->idx = @1.first_column; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); if (parse_events_add_pmu(_parse_state, list, $1, $2, false, false)) { struct perf_pmu *pmu = NULL; int ok = 0; @@ -306,7 +312,8 @@ value_sym '/' event_config '/' int type = $1 >> 16; int config = $1 & 255; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_numeric(_parse_state, list, type, config, $3)); parse_events_terms__delete($3); $$ = list; @@ -318,7 +325,8 @@ value_sym sep_slash_slash_dc int type = $1 >> 16; int config = $1 & 255; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_numeric(_parse_state, list, type, config, NULL)); $$ = list; } @@ -327,7 +335,8 @@ PE_VALUE_SYM_TOOL sep_slash_slash_dc { struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_tool(_parse_state, list, $1)); $$ = list; } @@ -339,7 +348,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_e struct parse_events_error *error = parse_state->error; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6)); parse_events_terms__delete($6); $$ = list; @@ -351,7 +361,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT opt_event_config struct parse_events_error *error = parse_state->error; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4)); parse_events_terms__delete($4); $$ = list; @@ -363,7 +374,8 @@ PE_NAME_CACHE_TYPE opt_event_config struct parse_events_error *error = parse_state->error; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2)); parse_events_terms__delete($2); $$ = list; @@ -375,7 +387,8 @@ PE_PREFIX_MEM PE_VALUE '/' PE_VALUE ':' PE_MODIFIER_BP sep_dc struct parse_events_state *parse_state = _parse_state; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_breakpoint(list, &parse_state->idx, (void *) $2, $6, $4)); $$ = list; @@ -386,7 +399,8 @@ PE_PREFIX_MEM PE_VALUE '/' PE_VALUE sep_dc struct parse_events_state *parse_state = _parse_state; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_breakpoint(list, &parse_state->idx, (void *) $2, NULL, $4)); $$ = list; @@ -397,7 +411,8 @@ PE_PREFIX_MEM PE_VALUE ':' PE_MODIFIER_BP sep_dc struct parse_events_state *parse_state = _parse_state; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_breakpoint(list, &parse_state->idx, (void *) $2, $4, 0)); $$ = list; @@ -408,7 +423,8 @@ PE_PREFIX_MEM PE_VALUE sep_dc struct parse_events_state *parse_state = _parse_state; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_breakpoint(list, &parse_state->idx, (void *) $2, NULL, 0)); $$ = list; @@ -421,7 +437,8 @@ tracepoint_name opt_event_config struct parse_events_error *error = parse_state->error; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); if (error) error->idx = @1.first_column; @@ -457,7 +474,8 @@ PE_VALUE ':' PE_VALUE opt_event_config { struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_numeric(_parse_state, list, (u32)$1, $3, $4)); parse_events_terms__delete($4); $$ = list; @@ -468,7 +486,8 @@ PE_RAW opt_event_config { struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, $1, $2)); parse_events_terms__delete($2); $$ = list; @@ -480,7 +499,8 @@ PE_BPF_OBJECT opt_event_config struct parse_events_state *parse_state = _parse_state; struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_load_bpf(parse_state, list, $1, false, $2)); parse_events_terms__delete($2); $$ = list; @@ -490,7 +510,8 @@ PE_BPF_SOURCE opt_event_config { struct list_head *list; - ALLOC_LIST(list); + list = alloc_list(); + ABORT_ON(!list); ABORT_ON(parse_events_load_bpf(_parse_state, list, $1, true, $2)); parse_events_terms__delete($2); $$ = list; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 12/63] perf tools: Avoid a malloc() for array events 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (10 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 11/63] perf tools: Move ALLOC_LIST into a function Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 13/63] perf tests: Fix a typo Arnaldo Carvalho de Melo ` (50 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> Use realloc() rather than malloc()+memcpy() to possibly avoid a memory allocation when appending array elements. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191023005337.196160-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.y | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 5863acb34780..ffa1a1b63796 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -689,14 +689,12 @@ array_terms ',' array_term struct parse_events_array new_array; new_array.nr_ranges = $1.nr_ranges + $3.nr_ranges; - new_array.ranges = malloc(sizeof(new_array.ranges[0]) * - new_array.nr_ranges); + new_array.ranges = realloc($1.ranges, + sizeof(new_array.ranges[0]) * + new_array.nr_ranges); ABORT_ON(!new_array.ranges); - memcpy(&new_array.ranges[0], $1.ranges, - $1.nr_ranges * sizeof(new_array.ranges[0])); memcpy(&new_array.ranges[$1.nr_ranges], $3.ranges, $3.nr_ranges * sizeof(new_array.ranges[0])); - free($1.ranges); free($3.ranges); $$ = new_array; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 13/63] perf tests: Fix a typo 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (11 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 12/63] perf tools: Avoid a malloc() for array events Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 14/63] perf kvm: Use evlist layer api when possible Arnaldo Carvalho de Melo ` (49 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Leo Yan, Will Deacon, Alexander Shishkin, Florian Fainelli, Jiri Olsa, Mark Rutland, Peter Zijlstra, Arnaldo Carvalho de Melo From: Leo Yan <leo.yan@linaro.org> Correct typo in comment: s/suck/stuck. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reported-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191023083324.12093-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/tests/bp_signal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/tests/bp_signal.c b/tools/perf/tests/bp_signal.c index 166f411568a5..415903b48578 100644 --- a/tools/perf/tests/bp_signal.c +++ b/tools/perf/tests/bp_signal.c @@ -295,7 +295,7 @@ bool test__bp_signal_is_supported(void) * breakpointed instruction. * * Since arm64 has the same issue with arm for the single-step - * handling, this case also gets suck on the breakpointed + * handling, this case also gets stuck on the breakpointed * instruction. * * Just disable the test for these architectures until these -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 14/63] perf kvm: Use evlist layer api when possible 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (12 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 13/63] perf tests: Fix a typo Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 15/63] perf probe: Fix to find range-only function instance Arnaldo Carvalho de Melo ` (48 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Igor Lubashev, Alexander Shishkin, Mark Rutland, Peter Zijlstra, Arnaldo Carvalho de Melo From: Igor Lubashev <ilubashe@akamai.com> No need for layer violations when a proper evlist api is available. Signed-off-by: Igor Lubashev <ilubashe@akamai.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1571795693-23558-4-git-send-email-ilubashe@akamai.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 858da896b518..577af4f3297a 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -998,7 +998,7 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm) done = perf_kvm__handle_stdin(); if (!rc && !done) - err = fdarray__poll(fda, 100); + err = evlist__poll(kvm->evlist, 100); } evlist__disable(kvm->evlist); -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 15/63] perf probe: Fix to find range-only function instance 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (13 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 14/63] perf kvm: Use evlist layer api when possible Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 16/63] perf probe: Walk function lines in lexical blocks Arnaldo Carvalho de Melo ` (47 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Jiri Olsa, Arnaldo Carvalho de Melo From: Masami Hiramatsu <mhiramat@kernel.org> Fix die_is_func_instance() to find range-only function instance. In some case, a function instance can be made without any low PC or entry PC, but only with address ranges by optimization. (e.g. cold text partially in "text.unlikely" section) To find such function instance, we have to check the range attribute too. Fixes: e1ecbbc3fa83 ("perf probe: Fix to handle optimized not-inlined functions") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157190835669.1859.8368628035930950596.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dwarf-aux.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index df6cee5c071f..2ec24c3bed44 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -318,10 +318,14 @@ bool die_is_func_def(Dwarf_Die *dw_die) bool die_is_func_instance(Dwarf_Die *dw_die) { Dwarf_Addr tmp; + Dwarf_Attribute attr_mem; /* Actually gcc optimizes non-inline as like as inlined */ - return !dwarf_func_inline(dw_die) && dwarf_entrypc(dw_die, &tmp) == 0; + return !dwarf_func_inline(dw_die) && + (dwarf_entrypc(dw_die, &tmp) == 0 || + dwarf_attr(dw_die, DW_AT_ranges, &attr_mem) != NULL); } + /** * die_get_data_member_location - Get the data-member offset * @mb_die: a DIE of a member of a data structure -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 16/63] perf probe: Walk function lines in lexical blocks 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (14 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 15/63] perf probe: Fix to find range-only function instance Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 17/63] perf probe: Fix to show function entry line as probe-able Arnaldo Carvalho de Melo ` (46 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Jiri Olsa, Arnaldo Carvalho de Melo From: Masami Hiramatsu <mhiramat@kernel.org> Since some inlined functions are in lexical blocks of given function, we have to recursively walk through the DIE tree. Without this fix, perf-probe -L can miss the inlined functions which is in a lexical block (like if (..) { func() } case.) However, even though, to walk the lines in a given function, we don't need to follow the children DIE of inlined functions because those do not have any lines in the specified function. We need to walk though whole trees only if we walk all lines in a given file, because an inlined function can include another inlined function in the same file. Fixes: b0e9cb2802d4 ("perf probe: Fix to search nested inlined functions in CU") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157190836514.1859.15996864849678136353.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dwarf-aux.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index 2ec24c3bed44..929b7c0567f4 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -678,10 +678,9 @@ static int __die_walk_funclines_cb(Dwarf_Die *in_die, void *data) if (lw->retval != 0) return DIE_FIND_CB_END; } + if (!lw->recursive) + return DIE_FIND_CB_SIBLING; } - if (!lw->recursive) - /* Don't need to search recursively */ - return DIE_FIND_CB_SIBLING; if (addr) { fname = dwarf_decl_file(in_die); @@ -728,6 +727,10 @@ static int __die_walk_culines_cb(Dwarf_Die *sp_die, void *data) { struct __line_walk_param *lw = data; + /* + * Since inlined function can include another inlined function in + * the same file, we need to walk in it recursively. + */ lw->retval = __die_walk_funclines(sp_die, true, lw->callback, lw->data); if (lw->retval != 0) return DWARF_CB_ABORT; @@ -817,8 +820,9 @@ int die_walk_lines(Dwarf_Die *rt_die, line_walk_callback_t callback, void *data) */ if (rt_die != cu_die) /* - * Don't need walk functions recursively, because nested - * inlined functions don't have lines of the specified DIE. + * Don't need walk inlined functions recursively, because + * inner inlined functions don't have the lines of the + * specified function. */ ret = __die_walk_funclines(rt_die, false, callback, data); else { -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 17/63] perf probe: Fix to show function entry line as probe-able 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (15 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 16/63] perf probe: Walk function lines in lexical blocks Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 18/63] perf jevents: Fix resource leak in process_mapfile() and main() Arnaldo Carvalho de Melo ` (45 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Jiri Olsa, Arnaldo Carvalho de Melo From: Masami Hiramatsu <mhiramat@kernel.org> Fix die_walk_lines() to list the function entry line correctly. Since the dwarf_entrypc() does not return the entry pc if the DIE has only range attribute, __die_walk_funclines() fails to list the declaration line (entry line) in that case. To solve this issue, this introduces die_entrypc() which correctly returns the entry PC (the first address range) even if the DIE has only range attribute. With this fix die_walk_lines() shows the function entry line is able to probe correctly. Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157190837419.1859.4619125803596816752.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dwarf-aux.c | 24 +++++++++++++++++++++++- tools/perf/util/dwarf-aux.h | 3 +++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index 929b7c0567f4..063f71da6b63 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -307,6 +307,28 @@ bool die_is_func_def(Dwarf_Die *dw_die) dwarf_attr(dw_die, DW_AT_declaration, &attr) == NULL); } +/** + * die_entrypc - Returns entry PC (the lowest address) of a DIE + * @dw_die: a DIE + * @addr: where to store entry PC + * + * Since dwarf_entrypc() does not return entry PC if the DIE has only address + * range, we have to use this to retrieve the lowest address from the address + * range attribute. + */ +int die_entrypc(Dwarf_Die *dw_die, Dwarf_Addr *addr) +{ + Dwarf_Addr base, end; + + if (!addr) + return -EINVAL; + + if (dwarf_entrypc(dw_die, addr) == 0) + return 0; + + return dwarf_ranges(dw_die, 0, &base, addr, &end) < 0 ? -ENOENT : 0; +} + /** * die_is_func_instance - Ensure that this DIE is an instance of a subprogram * @dw_die: a DIE @@ -713,7 +735,7 @@ static int __die_walk_funclines(Dwarf_Die *sp_die, bool recursive, /* Handle function declaration line */ fname = dwarf_decl_file(sp_die); if (fname && dwarf_decl_line(sp_die, &lineno) == 0 && - dwarf_entrypc(sp_die, &addr) == 0) { + die_entrypc(sp_die, &addr) == 0) { lw.retval = callback(fname, lineno, addr, data); if (lw.retval != 0) goto done; diff --git a/tools/perf/util/dwarf-aux.h b/tools/perf/util/dwarf-aux.h index f204e5892403..506006e0cf66 100644 --- a/tools/perf/util/dwarf-aux.h +++ b/tools/perf/util/dwarf-aux.h @@ -29,6 +29,9 @@ int cu_walk_functions_at(Dwarf_Die *cu_die, Dwarf_Addr addr, /* Get DW_AT_linkage_name (should be NULL for C binary) */ const char *die_get_linkage_name(Dwarf_Die *dw_die); +/* Get the lowest PC in DIE (including range list) */ +int die_entrypc(Dwarf_Die *dw_die, Dwarf_Addr *addr); + /* Ensure that this DIE is a subprogram and definition (not declaration) */ bool die_is_func_def(Dwarf_Die *dw_die); -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 18/63] perf jevents: Fix resource leak in process_mapfile() and main() 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (16 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 17/63] perf probe: Fix to show function entry line as probe-able Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 19/63] perf probe: Fix wrong address verification Arnaldo Carvalho de Melo ` (44 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Yunfeng Ye, Alexander Shishkin, Andi Kleen, Feilong Lin, Hu Shiyuan, Jiri Olsa, John Garry, Kan Liang, Luke Mujica, Mark Rutland, Peter Zijlstra, Zenghui Yu, Arnaldo Carvalho de Melo From: Yunfeng Ye <yeyunfeng@huawei.com> There are memory leaks and file descriptor resource leaks in process_mapfile() and main(). Fix this by adding free(), fclose() and free_arch_std_events() on the error paths. Fixes: 80eeb67fe577 ("perf jevents: Program to convert JSON file") Fixes: 3f056b66647b ("perf jevents: Make build fail on JSON parse error") Fixes: e9d32c1bf0cd ("perf vendor events: Add support for arch standard events") Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Feilong Lin <linfeilong@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Luke Mujica <lukemujica@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zenghui Yu <yuzenghui@huawei.com> Link: http://lore.kernel.org/lkml/d7907042-ec9c-2bef-25b4-810e14602f89@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/pmu-events/jevents.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index 7d69727f44bd..079c77b6a2fd 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -772,6 +772,7 @@ static int process_mapfile(FILE *outfp, char *fpath) char *line, *p; int line_num; char *tblname; + int ret = 0; pr_info("%s: Processing mapfile %s\n", prog, fpath); @@ -783,6 +784,7 @@ static int process_mapfile(FILE *outfp, char *fpath) if (!mapfp) { pr_info("%s: Error %s opening %s\n", prog, strerror(errno), fpath); + free(line); return -1; } @@ -809,7 +811,8 @@ static int process_mapfile(FILE *outfp, char *fpath) /* TODO Deal with lines longer than 16K */ pr_info("%s: Mapfile %s: line %d too long, aborting\n", prog, fpath, line_num); - return -1; + ret = -1; + goto out; } line[strlen(line)-1] = '\0'; @@ -839,7 +842,9 @@ static int process_mapfile(FILE *outfp, char *fpath) out: print_mapping_table_suffix(outfp); - return 0; + fclose(mapfp); + free(line); + return ret; } /* @@ -1136,6 +1141,7 @@ int main(int argc, char *argv[]) goto empty_map; } else if (rc < 0) { /* Make build fail */ + fclose(eventsfp); free_arch_std_events(); return 1; } else if (rc) { @@ -1148,6 +1154,7 @@ int main(int argc, char *argv[]) goto empty_map; } else if (rc < 0) { /* Make build fail */ + fclose(eventsfp); free_arch_std_events(); return 1; } else if (rc) { @@ -1165,6 +1172,8 @@ int main(int argc, char *argv[]) if (process_mapfile(eventsfp, mapfile)) { pr_info("%s: Error processing mapfile %s\n", prog, mapfile); /* Make build fail */ + fclose(eventsfp); + free_arch_std_events(); return 1; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 19/63] perf probe: Fix wrong address verification 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (17 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 18/63] perf jevents: Fix resource leak in process_mapfile() and main() Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 20/63] perf probe: Fix to probe a function which has no entry pc Arnaldo Carvalho de Melo ` (43 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Since there are some DIE which has only ranges instead of the combination of entrypc/highpc, address verification must use dwarf_haspc() instead of dwarf_entrypc/dwarf_highpc. Also, the ranges only DIE will have a partial code in different section (e.g. unlikely code will be in text.unlikely as "FUNC.cold" symbol). In that case, we can not use dwarf_entrypc() or die_entrypc(), because the offset from original DIE can be a minus value. Instead, this simply gets the symbol and offset from symtab. Without this patch; # perf probe -D clear_tasks_mm_cpumask:1 Failed to get entry address of clear_tasks_mm_cpumask Error: Failed to add events. And with this patch: # perf probe -D clear_tasks_mm_cpumask:1 p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0 p:probe/clear_tasks_mm_cpumask_1 clear_tasks_mm_cpumask+5 p:probe/clear_tasks_mm_cpumask_2 clear_tasks_mm_cpumask+8 p:probe/clear_tasks_mm_cpumask_3 clear_tasks_mm_cpumask+16 p:probe/clear_tasks_mm_cpumask_4 clear_tasks_mm_cpumask+82 Committer testing: I managed to reproduce the above: [root@quaco ~]# perf probe -D clear_tasks_mm_cpumask:1 p:probe/clear_tasks_mm_cpumask _text+919968 p:probe/clear_tasks_mm_cpumask_1 _text+919973 p:probe/clear_tasks_mm_cpumask_2 _text+919976 [root@quaco ~]# But then when trying to actually put the probe in place, it fails if I use :0 as the offset: [root@quaco ~]# perf probe -L clear_tasks_mm_cpumask | head -5 <clear_tasks_mm_cpumask@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/kernel/cpu.c:0> 0 void clear_tasks_mm_cpumask(int cpu) 1 { 2 struct task_struct *p; [root@quaco ~]# perf probe clear_tasks_mm_cpumask:0 Probe point 'clear_tasks_mm_cpumask' not found. Error: Failed to add events. [root@quaco The next patch is needed to fix this case. Fixes: 576b523721b7 ("perf probe: Fix probing symbols with optimization suffix") Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157199318513.8075.10463906803299647907.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/probe-finder.c | 32 ++++++++++---------------------- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index cd9f95e5044e..2b6513e5725c 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -604,38 +604,26 @@ static int convert_to_trace_point(Dwarf_Die *sp_die, Dwfl_Module *mod, const char *function, struct probe_trace_point *tp) { - Dwarf_Addr eaddr, highaddr; + Dwarf_Addr eaddr; GElf_Sym sym; const char *symbol; /* Verify the address is correct */ - if (dwarf_entrypc(sp_die, &eaddr) != 0) { - pr_warning("Failed to get entry address of %s\n", - dwarf_diename(sp_die)); - return -ENOENT; - } - if (dwarf_highpc(sp_die, &highaddr) != 0) { - pr_warning("Failed to get end address of %s\n", - dwarf_diename(sp_die)); - return -ENOENT; - } - if (paddr > highaddr) { - pr_warning("Offset specified is greater than size of %s\n", + if (!dwarf_haspc(sp_die, paddr)) { + pr_warning("Specified offset is out of %s\n", dwarf_diename(sp_die)); return -EINVAL; } - symbol = dwarf_diename(sp_die); + /* Try to get actual symbol name from symtab */ + symbol = dwfl_module_addrsym(mod, paddr, &sym, NULL); if (!symbol) { - /* Try to get the symbol name from symtab */ - symbol = dwfl_module_addrsym(mod, paddr, &sym, NULL); - if (!symbol) { - pr_warning("Failed to find symbol at 0x%lx\n", - (unsigned long)paddr); - return -ENOENT; - } - eaddr = sym.st_value; + pr_warning("Failed to find symbol at 0x%lx\n", + (unsigned long)paddr); + return -ENOENT; } + eaddr = sym.st_value; + tp->offset = (unsigned long)(paddr - eaddr); tp->address = (unsigned long)paddr; tp->symbol = strdup(symbol); -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 20/63] perf probe: Fix to probe a function which has no entry pc 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (18 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 19/63] perf probe: Fix wrong address verification Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 21/63] perf probe: Fix to probe an inline " Arnaldo Carvalho de Melo ` (42 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Fix 'perf probe' to probe a function which has no entry pc or low pc but only has ranges attribute. probe_point_search_cb() uses dwarf_entrypc() to get the probe address, but that doesn't work for the function DIE which has only ranges attribute. Use die_entrypc() instead. Without this fix: # perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0 Probe point 'clear_tasks_mm_cpumask' not found. Error: Failed to add events. With this: # perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0 p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0 Committer testing: Before: [root@quaco ~]# perf probe clear_tasks_mm_cpumask:0 Probe point 'clear_tasks_mm_cpumask' not found. Error: Failed to add events. [root@quaco ~]# After: [root@quaco ~]# perf probe clear_tasks_mm_cpumask:0 Added new event: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1 [root@quaco ~]# Using it with 'perf trace': [root@quaco ~]# perf trace -e probe:clear_tasks_mm_cpumask Doesn't seem to be used in x86_64: $ find . -name "*.c" | xargs grep clear_tasks_mm_cpumask ./kernel/cpu.c: * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU ./kernel/cpu.c:void clear_tasks_mm_cpumask(int cpu) ./arch/xtensa/kernel/smp.c: clear_tasks_mm_cpumask(cpu); ./arch/csky/kernel/smp.c: clear_tasks_mm_cpumask(cpu); ./arch/sh/kernel/smp.c: clear_tasks_mm_cpumask(cpu); ./arch/arm/kernel/smp.c: clear_tasks_mm_cpumask(cpu); ./arch/powerpc/mm/nohash/mmu_context.c: clear_tasks_mm_cpumask(cpu); $ find . -name "*.h" | xargs grep clear_tasks_mm_cpumask ./include/linux/cpu.h:void clear_tasks_mm_cpumask(int cpu); $ find . -name "*.S" | xargs grep clear_tasks_mm_cpumask $ Fixes: e1ecbbc3fa83 ("perf probe: Fix to handle optimized not-inlined functions") Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157199319438.8075.4695576954550638618.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/probe-finder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index 2b6513e5725c..71633f55f045 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -982,7 +982,7 @@ static int probe_point_search_cb(Dwarf_Die *sp_die, void *data) param->retval = find_probe_point_by_line(pf); } else if (die_is_func_instance(sp_die)) { /* Instances always have the entry address */ - dwarf_entrypc(sp_die, &pf->addr); + die_entrypc(sp_die, &pf->addr); /* But in some case the entry address is 0 */ if (pf->addr == 0) { pr_debug("%s has no entry PC. Skipped\n", -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 21/63] perf probe: Fix to probe an inline function which has no entry pc 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (19 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 20/63] perf probe: Fix to probe a function which has no entry pc Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 22/63] perf probe: Fix to list probe event with correct line number Arnaldo Carvalho de Melo ` (41 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Fix perf probe to probe an inlne function which has no entry pc or low pc but only has ranges attribute. This seems very rare case, but I could find a few examples, as same as probe_point_search_cb(), use die_entrypc() to get the entry address in probe_point_inline_cb() too. Without this patch: # perf probe -D __amd_put_nb_event_constraints Failed to get entry address of __amd_put_nb_event_constraints. Probe point '__amd_put_nb_event_constraints' not found. Error: Failed to add events. With this patch: # perf probe -D __amd_put_nb_event_constraints p:probe/__amd_put_nb_event_constraints amd_put_event_constraints+43 Committer testing: Before: [root@quaco ~]# perf probe -D __amd_put_nb_event_constraints Failed to get entry address of __amd_put_nb_event_constraints. Probe point '__amd_put_nb_event_constraints' not found. Error: Failed to add events. [root@quaco ~]# After: [root@quaco ~]# perf probe -D __amd_put_nb_event_constraints p:probe/__amd_put_nb_event_constraints _text+33789 [root@quaco ~]# Fixes: 4ea42b181434 ("perf: Add perf probe subcommand, a kprobe-event setup helper") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157199320336.8075.16189530425277588587.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/probe-finder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index 71633f55f045..2fa932bcf960 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -930,7 +930,7 @@ static int probe_point_inline_cb(Dwarf_Die *in_die, void *data) ret = find_probe_point_lazy(in_die, pf); else { /* Get probe address */ - if (dwarf_entrypc(in_die, &addr) != 0) { + if (die_entrypc(in_die, &addr) != 0) { pr_warning("Failed to get entry address of %s.\n", dwarf_diename(in_die)); return -ENOENT; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 22/63] perf probe: Fix to list probe event with correct line number 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (20 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 21/63] perf probe: Fix to probe an inline " Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 23/63] perf probe: Fix to show inlined function callsite without entry_pc Arnaldo Carvalho de Melo ` (40 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Since debuginfo__find_probe_point() uses dwarf_entrypc() for finding the entry address of the function on which a probe is, it will fail when the function DIE has only ranges attribute. To fix this issue, use die_entrypc() instead of dwarf_entrypc(). Without this fix, perf probe -l shows incorrect offset: # perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579263632@work/linux/linux/kernel/cpu.c) probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask+18446744071579263752@work/linux/linux/kernel/cpu.c) With this: # perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@work/linux/linux/kernel/cpu.c) probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:21@work/linux/linux/kernel/cpu.c) Committer testing: Before: [root@quaco ~]# perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579765152@kernel/cpu.c) [root@quaco ~]# After: [root@quaco ~]# perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c) [root@quaco ~]# Fixes: 1d46ea2a6a40 ("perf probe: Fix listing incorrect line number with inline function") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157199321227.8075.14655572419136993015.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/probe-finder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index 2fa932bcf960..88e17a4f5ac3 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -1566,7 +1566,7 @@ int debuginfo__find_probe_point(struct debuginfo *dbg, unsigned long addr, /* Get function entry information */ func = basefunc = dwarf_diename(&spdie); if (!func || - dwarf_entrypc(&spdie, &baseaddr) != 0 || + die_entrypc(&spdie, &baseaddr) != 0 || dwarf_decl_line(&spdie, &baseline) != 0) { lineno = 0; goto post; @@ -1583,7 +1583,7 @@ int debuginfo__find_probe_point(struct debuginfo *dbg, unsigned long addr, while (die_find_top_inlinefunc(&spdie, (Dwarf_Addr)addr, &indie)) { /* There is an inline function */ - if (dwarf_entrypc(&indie, &_addr) == 0 && + if (die_entrypc(&indie, &_addr) == 0 && _addr == addr) { /* * addr is at an inline function entry. -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 23/63] perf probe: Fix to show inlined function callsite without entry_pc 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (21 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 22/63] perf probe: Fix to list probe event with correct line number Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 24/63] perf probe: Fix to show ranges of variables in functions " Arnaldo Carvalho de Melo ` (39 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Fix 'perf probe --line' option to show inlined function callsite lines even if the function DIE has only ranges. Without this: # perf probe -L amd_put_event_constraints ... 2 { 3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)) __amd_put_nb_event_constraints(cpuc, event); 5 } With this patch: # perf probe -L amd_put_event_constraints ... 2 { 3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)) 4 __amd_put_nb_event_constraints(cpuc, event); 5 } Committer testing: Before: [root@quaco ~]# perf probe -L amd_put_event_constraints <amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0> 0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event) 2 { 3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)) __amd_put_nb_event_constraints(cpuc, event); 5 } PMU_FORMAT_ATTR(event, "config:0-7,32-35"); PMU_FORMAT_ATTR(umask, "config:8-15" ); [root@quaco ~]# After: [root@quaco ~]# perf probe -L amd_put_event_constraints <amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0> 0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event) 2 { 3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)) 4 __amd_put_nb_event_constraints(cpuc, event); 5 } PMU_FORMAT_ATTR(event, "config:0-7,32-35"); PMU_FORMAT_ATTR(umask, "config:8-15" ); [root@quaco ~]# perf probe amd_put_event_constraints:4 Added new event: probe:amd_put_event_constraints (on amd_put_event_constraints:4) You can now use it in all perf tools, such as: perf record -e probe:amd_put_event_constraints -aR sleep 1 [root@quaco ~]# [root@quaco ~]# perf probe -l probe:amd_put_event_constraints (on amd_put_event_constraints:4@arch/x86/events/amd/core.c) probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c) [root@quaco ~]# Using it: [root@quaco ~]# perf trace -e probe:* ^C[root@quaco ~]# Ok, Intel system here... :-) Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157199322107.8075.12659099000567865708.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dwarf-aux.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index 063f71da6b63..e0c507d6b3b4 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -695,7 +695,7 @@ static int __die_walk_funclines_cb(Dwarf_Die *in_die, void *data) if (dwarf_tag(in_die) == DW_TAG_inlined_subroutine) { fname = die_get_call_file(in_die); lineno = die_get_call_lineno(in_die); - if (fname && lineno > 0 && dwarf_entrypc(in_die, &addr) == 0) { + if (fname && lineno > 0 && die_entrypc(in_die, &addr) == 0) { lw->retval = lw->callback(fname, lineno, addr, lw->data); if (lw->retval != 0) return DIE_FIND_CB_END; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 24/63] perf probe: Fix to show ranges of variables in functions without entry_pc 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (22 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 23/63] perf probe: Fix to show inlined function callsite without entry_pc Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 25/63] perf auxtrace: Add auxtrace_cache__remove() Arnaldo Carvalho de Melo ` (38 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Fix to show ranges of variables (--range and --vars option) in functions which DIE has only ranges but no entry_pc attribute. Without this fix: # perf probe --range -V clear_tasks_mm_cpumask Available variables at clear_tasks_mm_cpumask @<clear_tasks_mm_cpumask+0> (No matched variables) With this fix: # perf probe --range -V clear_tasks_mm_cpumask Available variables at clear_tasks_mm_cpumask @<clear_tasks_mm_cpumask+0> [VAL] int cpu @<clear_tasks_mm_cpumask+[0-35,317-317,2052-2059]> Committer testing: Before: [root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask Available variables at clear_tasks_mm_cpumask @<clear_tasks_mm_cpumask+0> (No matched variables) [root@quaco ~]# After: [root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask Available variables at clear_tasks_mm_cpumask @<clear_tasks_mm_cpumask+0> [VAL] int cpu @<clear_tasks_mm_cpumask+[0-23,23-105,105-106,106-106,1843-1850,1850-1862]> [root@quaco ~]# Using it: [root@quaco ~]# perf probe clear_tasks_mm_cpumask cpu Added new event: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask with cpu) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1 [root@quaco ~]# perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c with cpu) [root@quaco ~]# [root@quaco ~]# perf trace -e probe:*cpumask ^C[root@quaco ~]# Fixes: 349e8d261131 ("perf probe: Add --range option to show a variable's location range") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157199323018.8075.8179744380479673672.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dwarf-aux.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index e0c507d6b3b4..ac82fd937e4b 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -1019,7 +1019,7 @@ static int die_get_var_innermost_scope(Dwarf_Die *sp_die, Dwarf_Die *vr_die, bool first = true; const char *name; - ret = dwarf_entrypc(sp_die, &entry); + ret = die_entrypc(sp_die, &entry); if (ret) return ret; @@ -1082,7 +1082,7 @@ int die_get_var_range(Dwarf_Die *sp_die, Dwarf_Die *vr_die, struct strbuf *buf) bool first = true; const char *name; - ret = dwarf_entrypc(sp_die, &entry); + ret = die_entrypc(sp_die, &entry); if (ret) return ret; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 25/63] perf auxtrace: Add auxtrace_cache__remove() 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (23 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 24/63] perf probe: Fix to show ranges of variables in functions " Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 26/63] perf dso: Refactor dso_cache__read() Arnaldo Carvalho de Melo ` (37 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexander Shishkin, Borislav Petkov, H . Peter Anvin, Jiri Olsa, Leo Yan, Mark Rutland, Mathieu Poirier, Peter Zijlstra, x86, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> Add auxtrace_cache__remove(). Intel PT uses an auxtrace_cache to store the results of code-walking, so that the same block of instructions does not have to be decoded repeatedly. However, when there are text poke events, the associated cache entries need to be removed. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/auxtrace.c | 28 ++++++++++++++++++++++++++++ tools/perf/util/auxtrace.h | 1 + 2 files changed, 29 insertions(+) diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index 8470dfe9fe97..c555c3ccd79d 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -1457,6 +1457,34 @@ int auxtrace_cache__add(struct auxtrace_cache *c, u32 key, return 0; } +static struct auxtrace_cache_entry *auxtrace_cache__rm(struct auxtrace_cache *c, + u32 key) +{ + struct auxtrace_cache_entry *entry; + struct hlist_head *hlist; + struct hlist_node *n; + + if (!c) + return NULL; + + hlist = &c->hashtable[hash_32(key, c->bits)]; + hlist_for_each_entry_safe(entry, n, hlist, hash) { + if (entry->key == key) { + hlist_del(&entry->hash); + return entry; + } + } + + return NULL; +} + +void auxtrace_cache__remove(struct auxtrace_cache *c, u32 key) +{ + struct auxtrace_cache_entry *entry = auxtrace_cache__rm(c, key); + + auxtrace_cache__free_entry(c, entry); +} + void *auxtrace_cache__lookup(struct auxtrace_cache *c, u32 key) { struct auxtrace_cache_entry *entry; diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index f201f36bc35f..3f4aa5427d76 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -489,6 +489,7 @@ void *auxtrace_cache__alloc_entry(struct auxtrace_cache *c); void auxtrace_cache__free_entry(struct auxtrace_cache *c, void *entry); int auxtrace_cache__add(struct auxtrace_cache *c, u32 key, struct auxtrace_cache_entry *entry); +void auxtrace_cache__remove(struct auxtrace_cache *c, u32 key); void *auxtrace_cache__lookup(struct auxtrace_cache *c, u32 key); struct auxtrace_record *auxtrace_record__init(struct evlist *evlist, -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 26/63] perf dso: Refactor dso_cache__read() 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (24 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 25/63] perf auxtrace: Add auxtrace_cache__remove() Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 27/63] perf dso: Add dso__data_write_cache_addr() Arnaldo Carvalho de Melo ` (36 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexander Shishkin, Borislav Petkov, H . Peter Anvin, Jiri Olsa, Leo Yan, Mark Rutland, Mathieu Poirier, Peter Zijlstra, x86, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> Refactor dso_cache__read() to separate populating the cache from copying data from it. This is preparation for adding a cache "write" that will update the data in the cache. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dso.c | 64 +++++++++++++++++++++++++------------------ 1 file changed, 37 insertions(+), 27 deletions(-) diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c index e11ddf86f2b3..460330d125b6 100644 --- a/tools/perf/util/dso.c +++ b/tools/perf/util/dso.c @@ -768,7 +768,7 @@ dso_cache__free(struct dso *dso) pthread_mutex_unlock(&dso->lock); } -static struct dso_cache *dso_cache__find(struct dso *dso, u64 offset) +static struct dso_cache *__dso_cache__find(struct dso *dso, u64 offset) { const struct rb_root *root = &dso->data.cache; struct rb_node * const *p = &root->rb_node; @@ -863,54 +863,64 @@ static ssize_t file_read(struct dso *dso, struct machine *machine, return ret; } -static ssize_t -dso_cache__read(struct dso *dso, struct machine *machine, - u64 offset, u8 *data, ssize_t size) +static struct dso_cache *dso_cache__populate(struct dso *dso, + struct machine *machine, + u64 offset, ssize_t *ret) { u64 cache_offset = offset & DSO__DATA_CACHE_MASK; struct dso_cache *cache; struct dso_cache *old; - ssize_t ret; cache = zalloc(sizeof(*cache) + DSO__DATA_CACHE_SIZE); - if (!cache) - return -ENOMEM; + if (!cache) { + *ret = -ENOMEM; + return NULL; + } if (dso->binary_type == DSO_BINARY_TYPE__BPF_PROG_INFO) - ret = bpf_read(dso, cache_offset, cache->data); + *ret = bpf_read(dso, cache_offset, cache->data); else - ret = file_read(dso, machine, cache_offset, cache->data); + *ret = file_read(dso, machine, cache_offset, cache->data); - if (ret > 0) { - cache->offset = cache_offset; - cache->size = ret; + if (*ret <= 0) { + free(cache); + return NULL; + } - old = dso_cache__insert(dso, cache); - if (old) { - /* we lose the race */ - free(cache); - cache = old; - } + cache->offset = cache_offset; + cache->size = *ret; - ret = dso_cache__memcpy(cache, offset, data, size); + old = dso_cache__insert(dso, cache); + if (old) { + /* we lose the race */ + free(cache); + cache = old; } - if (ret <= 0) - free(cache); + return cache; +} - return ret; +static struct dso_cache *dso_cache__find(struct dso *dso, + struct machine *machine, + u64 offset, + ssize_t *ret) +{ + struct dso_cache *cache = __dso_cache__find(dso, offset); + + return cache ? cache : dso_cache__populate(dso, machine, offset, ret); } static ssize_t dso_cache_read(struct dso *dso, struct machine *machine, u64 offset, u8 *data, ssize_t size) { struct dso_cache *cache; + ssize_t ret = 0; - cache = dso_cache__find(dso, offset); - if (cache) - return dso_cache__memcpy(cache, offset, data, size); - else - return dso_cache__read(dso, machine, offset, data, size); + cache = dso_cache__find(dso, machine, offset, &ret); + if (!cache) + return ret; + + return dso_cache__memcpy(cache, offset, data, size); } /* -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 27/63] perf dso: Add dso__data_write_cache_addr() 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (25 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 26/63] perf dso: Refactor dso_cache__read() Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 28/63] perf map: Check if the map still has some refcounts on exit Arnaldo Carvalho de Melo ` (35 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexander Shishkin, Borislav Petkov, H . Peter Anvin, Jiri Olsa, Leo Yan, Mark Rutland, Mathieu Poirier, Peter Zijlstra, x86, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> Add functions to write into the dso file data cache, but not change the file itself. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dso.c | 73 ++++++++++++++++++++++++++++++++++--------- tools/perf/util/dso.h | 7 +++++ 2 files changed, 65 insertions(+), 15 deletions(-) diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c index 460330d125b6..0f1b77275a86 100644 --- a/tools/perf/util/dso.c +++ b/tools/perf/util/dso.c @@ -827,14 +827,16 @@ dso_cache__insert(struct dso *dso, struct dso_cache *new) return cache; } -static ssize_t -dso_cache__memcpy(struct dso_cache *cache, u64 offset, - u8 *data, u64 size) +static ssize_t dso_cache__memcpy(struct dso_cache *cache, u64 offset, u8 *data, + u64 size, bool out) { u64 cache_offset = offset - cache->offset; u64 cache_size = min(cache->size - cache_offset, size); - memcpy(data, cache->data + cache_offset, cache_size); + if (out) + memcpy(data, cache->data + cache_offset, cache_size); + else + memcpy(cache->data + cache_offset, data, cache_size); return cache_size; } @@ -910,8 +912,8 @@ static struct dso_cache *dso_cache__find(struct dso *dso, return cache ? cache : dso_cache__populate(dso, machine, offset, ret); } -static ssize_t dso_cache_read(struct dso *dso, struct machine *machine, - u64 offset, u8 *data, ssize_t size) +static ssize_t dso_cache_io(struct dso *dso, struct machine *machine, + u64 offset, u8 *data, ssize_t size, bool out) { struct dso_cache *cache; ssize_t ret = 0; @@ -920,16 +922,16 @@ static ssize_t dso_cache_read(struct dso *dso, struct machine *machine, if (!cache) return ret; - return dso_cache__memcpy(cache, offset, data, size); + return dso_cache__memcpy(cache, offset, data, size, out); } /* * Reads and caches dso data DSO__DATA_CACHE_SIZE size chunks * in the rb_tree. Any read to already cached data is served - * by cached data. + * by cached data. Writes update the cache only, not the backing file. */ -static ssize_t cached_read(struct dso *dso, struct machine *machine, - u64 offset, u8 *data, ssize_t size) +static ssize_t cached_io(struct dso *dso, struct machine *machine, + u64 offset, u8 *data, ssize_t size, bool out) { ssize_t r = 0; u8 *p = data; @@ -937,7 +939,7 @@ static ssize_t cached_read(struct dso *dso, struct machine *machine, do { ssize_t ret; - ret = dso_cache_read(dso, machine, offset, p, size); + ret = dso_cache_io(dso, machine, offset, p, size, out); if (ret < 0) return ret; @@ -1021,8 +1023,9 @@ off_t dso__data_size(struct dso *dso, struct machine *machine) return dso->data.file_size; } -static ssize_t data_read_offset(struct dso *dso, struct machine *machine, - u64 offset, u8 *data, ssize_t size) +static ssize_t data_read_write_offset(struct dso *dso, struct machine *machine, + u64 offset, u8 *data, ssize_t size, + bool out) { if (dso__data_file_size(dso, machine)) return -1; @@ -1034,7 +1037,7 @@ static ssize_t data_read_offset(struct dso *dso, struct machine *machine, if (offset + size < offset) return -1; - return cached_read(dso, machine, offset, data, size); + return cached_io(dso, machine, offset, data, size, out); } /** @@ -1054,7 +1057,7 @@ ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine, if (dso->data.status == DSO_DATA_STATUS_ERROR) return -1; - return data_read_offset(dso, machine, offset, data, size); + return data_read_write_offset(dso, machine, offset, data, size, true); } /** @@ -1075,6 +1078,46 @@ ssize_t dso__data_read_addr(struct dso *dso, struct map *map, return dso__data_read_offset(dso, machine, offset, data, size); } +/** + * dso__data_write_cache_offs - Write data to dso data cache at file offset + * @dso: dso object + * @machine: machine object + * @offset: file offset + * @data: buffer to write + * @size: size of the @data buffer + * + * Write into the dso file data cache, but do not change the file itself. + */ +ssize_t dso__data_write_cache_offs(struct dso *dso, struct machine *machine, + u64 offset, const u8 *data_in, ssize_t size) +{ + u8 *data = (u8 *)data_in; /* cast away const to use same fns for r/w */ + + if (dso->data.status == DSO_DATA_STATUS_ERROR) + return -1; + + return data_read_write_offset(dso, machine, offset, data, size, false); +} + +/** + * dso__data_write_cache_addr - Write data to dso data cache at dso address + * @dso: dso object + * @machine: machine object + * @add: virtual memory address + * @data: buffer to write + * @size: size of the @data buffer + * + * External interface to write into the dso file data cache, but do not change + * the file itself. + */ +ssize_t dso__data_write_cache_addr(struct dso *dso, struct map *map, + struct machine *machine, u64 addr, + const u8 *data, ssize_t size) +{ + u64 offset = map->map_ip(map, addr); + return dso__data_write_cache_offs(dso, machine, offset, data, size); +} + struct map *dso__new_map(const char *name) { struct map *map = NULL; diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h index e4dddb76770d..2f1fcbc6fead 100644 --- a/tools/perf/util/dso.h +++ b/tools/perf/util/dso.h @@ -285,6 +285,8 @@ void dso__set_module_info(struct dso *dso, struct kmod_path *m, * dso__data_size * dso__data_read_offset * dso__data_read_addr + * dso__data_write_cache_offs + * dso__data_write_cache_addr * * Please refer to the dso.c object code for each function and * arguments documentation. Following text tries to explain the @@ -332,6 +334,11 @@ ssize_t dso__data_read_addr(struct dso *dso, struct map *map, struct machine *machine, u64 addr, u8 *data, ssize_t size); bool dso__data_status_seen(struct dso *dso, enum dso_data_status_seen by); +ssize_t dso__data_write_cache_offs(struct dso *dso, struct machine *machine, + u64 offset, const u8 *data, ssize_t size); +ssize_t dso__data_write_cache_addr(struct dso *dso, struct map *map, + struct machine *machine, u64 addr, + const u8 *data, ssize_t size); struct map *dso__new_map(const char *name); struct dso *machine__findnew_kernel(struct machine *machine, const char *name, -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 28/63] perf map: Check if the map still has some refcounts on exit 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (26 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 27/63] perf dso: Add dso__data_write_cache_addr() Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 29/63] perf map: Allow map__next() to receive a NULL arg Arnaldo Carvalho de Melo ` (34 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Andi Kleen From: Arnaldo Carvalho de Melo <acme@redhat.com> We were checking just if it was still on some rb tree, but that is not the only way that this map can still have references, map->refcnt is there exactly for this, use it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-hany65tbeavsax7n3xvwl9pc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index eec9b282c047..c9ba49566981 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -288,7 +288,7 @@ bool map__has_symbols(const struct map *map) static void map__exit(struct map *map) { - BUG_ON(!RB_EMPTY_NODE(&map->rb_node)); + BUG_ON(refcount_read(&map->refcnt) != 0); dso__zput(map->dso); } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 29/63] perf map: Allow map__next() to receive a NULL arg 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (27 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 28/63] perf map: Check if the map still has some refcounts on exit Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 30/63] perf maps: Add for_each_entry()/_safe() iterators Arnaldo Carvalho de Melo ` (33 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Andi Kleen From: Arnaldo Carvalho de Melo <acme@redhat.com> Just like free(), return NULL in that case, will simplify the for_each_entry_safe() iterators. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-pbde2ucn49khnrebclys9pny@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/map.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index c9ba49566981..86d8d187f872 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -1007,7 +1007,7 @@ struct map *maps__first(struct maps *maps) return NULL; } -struct map *map__next(struct map *map) +static struct map *__map__next(struct map *map) { struct rb_node *next = rb_next(&map->rb_node); @@ -1016,6 +1016,11 @@ struct map *map__next(struct map *map) return NULL; } +struct map *map__next(struct map *map) +{ + return map ? __map__next(map) : NULL; +} + struct kmap *__map__kmap(struct map *map) { if (!map->dso || !map->dso->kernel) -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 30/63] perf maps: Add for_each_entry()/_safe() iterators 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (28 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 29/63] perf map: Allow map__next() to receive a NULL arg Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 31/63] perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iterators Arnaldo Carvalho de Melo ` (32 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Andi Kleen From: Arnaldo Carvalho de Melo <acme@redhat.com> To reduce boilerplate, provide a more compact form using an idiom present in other trees of data structures. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/arch/x86/util/event.c | 2 +- tools/perf/builtin-report.c | 6 ++-- tools/perf/tests/vmlinux-kallsyms.c | 6 ++-- tools/perf/util/machine.c | 2 +- tools/perf/util/map.c | 56 +++++++++++++++++------------ tools/perf/util/map_groups.h | 15 ++++++++ tools/perf/util/probe-event.c | 2 +- tools/perf/util/symbol.c | 16 ++++----- tools/perf/util/synthetic-events.c | 2 +- tools/perf/util/thread.c | 2 +- 10 files changed, 65 insertions(+), 44 deletions(-) diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c index d357c625c09f..d1044df7c0d7 100644 --- a/tools/perf/arch/x86/util/event.c +++ b/tools/perf/arch/x86/util/event.c @@ -29,7 +29,7 @@ int perf_event__synthesize_extra_kmaps(struct perf_tool *tool, return -1; } - for (pos = maps__first(maps); pos; pos = map__next(pos)) { + maps__for_each_entry(maps, pos) { struct kmap *kmap; size_t size; diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 7accaf8ef689..3bbad039abf2 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -727,11 +727,9 @@ static struct task *tasks_list(struct task *task, struct machine *machine) static size_t maps__fprintf_task(struct maps *maps, int indent, FILE *fp) { size_t printed = 0; - struct rb_node *nd; - - for (nd = rb_first(&maps->entries); nd; nd = rb_next(nd)) { - struct map *map = rb_entry(nd, struct map, rb_node); + struct map *map; + maps__for_each_entry(maps, map) { printed += fprintf(fp, "%*s %" PRIx64 "-%" PRIx64 " %c%c%c%c %08" PRIx64 " %" PRIu64 " %s\n", indent, "", map->start, map->end, map->prot & PROT_READ ? 'r' : '-', diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c index aa296ffea6d1..ff649078da9a 100644 --- a/tools/perf/tests/vmlinux-kallsyms.c +++ b/tools/perf/tests/vmlinux-kallsyms.c @@ -182,7 +182,7 @@ int test__vmlinux_matches_kallsyms(struct test *test __maybe_unused, int subtest header_printed = false; - for (map = maps__first(maps); map; map = map__next(map)) { + maps__for_each_entry(maps, map) { struct map * /* * If it is the kernel, kallsyms is always "[kernel.kallsyms]", while @@ -207,7 +207,7 @@ int test__vmlinux_matches_kallsyms(struct test *test __maybe_unused, int subtest header_printed = false; - for (map = maps__first(maps); map; map = map__next(map)) { + maps__for_each_entry(maps, map) { struct map *pair; mem_start = vmlinux_map->unmap_ip(vmlinux_map, map->start); @@ -237,7 +237,7 @@ int test__vmlinux_matches_kallsyms(struct test *test __maybe_unused, int subtest maps = machine__kernel_maps(&kallsyms); - for (map = maps__first(maps); map; map = map__next(map)) { + maps__for_each_entry(maps, map) { if (!map->priv) { if (!header_printed) { pr_info("WARN: Maps only in kallsyms:\n"); diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 70a9f8716a4b..24d9e284daad 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1057,7 +1057,7 @@ int machine__map_x86_64_entry_trampolines(struct machine *machine, * In the vmlinux case, pgoff is a virtual address which must now be * mapped to a vmlinux offset. */ - for (map = maps__first(maps); map; map = map__next(map)) { + maps__for_each_entry(maps, map) { struct kmap *kmap = __map__kmap(map); struct map *dest_map; diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index 86d8d187f872..466c9b035e19 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -594,28 +594,20 @@ void map_groups__insert(struct map_groups *mg, struct map *map) static void __maps__purge(struct maps *maps) { - struct rb_root *root = &maps->entries; - struct rb_node *next = rb_first(root); + struct map *pos, *next; - while (next) { - struct map *pos = rb_entry(next, struct map, rb_node); - - next = rb_next(&pos->rb_node); - rb_erase_init(&pos->rb_node, root); + maps__for_each_entry_safe(maps, pos, next) { + rb_erase_init(&pos->rb_node, &maps->entries); map__put(pos); } } static void __maps__purge_names(struct maps *maps) { - struct rb_root *root = &maps->names; - struct rb_node *next = rb_first(root); - - while (next) { - struct map *pos = rb_entry(next, struct map, rb_node_name); + struct map *pos, *next; - next = rb_next(&pos->rb_node_name); - rb_erase_init(&pos->rb_node_name, root); + maps__for_each_entry_by_name_safe(maps, pos, next) { + rb_erase_init(&pos->rb_node_name, &maps->names); map__put(pos); } } @@ -687,13 +679,11 @@ struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp) { struct symbol *sym; - struct rb_node *nd; + struct map *pos; down_read(&maps->lock); - for (nd = rb_first(&maps->entries); nd; nd = rb_next(nd)) { - struct map *pos = rb_entry(nd, struct map, rb_node); - + maps__for_each_entry(maps, pos) { sym = map__find_symbol_by_name(pos, name); if (sym == NULL) @@ -739,12 +729,11 @@ int map_groups__find_ams(struct addr_map_symbol *ams) static size_t maps__fprintf(struct maps *maps, FILE *fp) { size_t printed = 0; - struct rb_node *nd; + struct map *pos; down_read(&maps->lock); - for (nd = rb_first(&maps->entries); nd; nd = rb_next(nd)) { - struct map *pos = rb_entry(nd, struct map, rb_node); + maps__for_each_entry(maps, pos) { printed += fprintf(fp, "Map:"); printed += map__fprintf(pos, fp); if (verbose > 2) { @@ -889,7 +878,7 @@ int map_groups__clone(struct thread *thread, struct map_groups *parent) down_read(&maps->lock); - for (map = maps__first(maps); map; map = map__next(map)) { + maps__for_each_entry(maps, map) { struct map *new = map__clone(map); if (new == NULL) goto out_unlock; @@ -1021,6 +1010,29 @@ struct map *map__next(struct map *map) return map ? __map__next(map) : NULL; } +struct map *maps__first_by_name(struct maps *maps) +{ + struct rb_node *first = rb_first(&maps->names); + + if (first) + return rb_entry(first, struct map, rb_node_name); + return NULL; +} + +static struct map *__map__next_by_name(struct map *map) +{ + struct rb_node *next = rb_next(&map->rb_node_name); + + if (next) + return rb_entry(next, struct map, rb_node_name); + return NULL; +} + +struct map *map__next_by_name(struct map *map) +{ + return map ? __map__next_by_name(map) : NULL; +} + struct kmap *__map__kmap(struct map *map) { if (!map->dso || !map->dso->kernel) diff --git a/tools/perf/util/map_groups.h b/tools/perf/util/map_groups.h index 77252e14008f..ce3ade32babd 100644 --- a/tools/perf/util/map_groups.h +++ b/tools/perf/util/map_groups.h @@ -25,7 +25,22 @@ void maps__remove(struct maps *maps, struct map *map); struct map *maps__find(struct maps *maps, u64 addr); struct map *maps__first(struct maps *maps); struct map *map__next(struct map *map); + +#define maps__for_each_entry(maps, map) \ + for (map = maps__first(maps); map; map = map__next(map)) + +#define maps__for_each_entry_safe(maps, map, next) \ + for (map = maps__first(maps), next = map__next(map); map; map = next, next = map__next(map)) + struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp); +struct map *maps__first_by_name(struct maps *maps); +struct map *map__next_by_name(struct map *map); + +#define maps__for_each_entry_by_name(maps, map) \ + for (map = maps__first_by_name(maps); map; map = map__next_by_name(map)) + +#define maps__for_each_entry_by_name_safe(maps, map, next) \ + for (map = maps__first_by_name(maps), next = map__next_by_name(map); map; map = next, next = map__next_by_name(map)) struct map_groups { struct maps maps; diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 91cab5f669d2..e29948b8fcab 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -153,7 +153,7 @@ static struct map *kernel_get_module_map(const char *module) return map__get(pos); } - for (pos = maps__first(maps); pos; pos = map__next(pos)) { + maps__for_each_entry(maps, pos) { /* short_name is "[module]" */ if (strncmp(pos->dso->short_name + 1, module, pos->dso->short_name_len - 2) == 0 && diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index a8f80e427674..042140fc4d36 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -242,28 +242,24 @@ void symbols__fixup_end(struct rb_root_cached *symbols) void map_groups__fixup_end(struct map_groups *mg) { struct maps *maps = &mg->maps; - struct map *next, *curr; + struct map *prev = NULL, *curr; down_write(&maps->lock); - curr = maps__first(maps); - if (curr == NULL) - goto out_unlock; + maps__for_each_entry(maps, curr) { + if (prev != NULL && !prev->end) + prev->end = curr->start; - for (next = map__next(curr); next; next = map__next(curr)) { - if (!curr->end) - curr->end = next->start; - curr = next; + prev = curr; } /* * We still haven't the actual symbols, so guess the * last map final address. */ - if (!curr->end) + if (curr && !curr->end) curr->end = ~0ULL; -out_unlock: up_write(&maps->lock); } diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index 807cbca403a7..cfa3c9f67141 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -438,7 +438,7 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t else event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL; - for (pos = maps__first(maps); pos; pos = map__next(pos)) { + maps__for_each_entry(maps, pos) { size_t size; if (!__map__is_kmodule(pos)) diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c index b64e9e049636..0a277a920970 100644 --- a/tools/perf/util/thread.c +++ b/tools/perf/util/thread.c @@ -350,7 +350,7 @@ static int __thread__prepare_access(struct thread *thread) down_read(&maps->lock); - for (map = maps__first(maps); map; map = map__next(map)) { + maps__for_each_entry(maps, map) { err = unwind__prepare_access(thread->mg, map, &initialized); if (err || initialized) break; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 31/63] perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iterators 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (29 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 30/63] perf maps: Add for_each_entry()/_safe() iterators Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 32/63] libsubcmd: Move EXTRA_FLAGS to the end to allow overriding existing flags Arnaldo Carvalho de Melo ` (31 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Andi Kleen From: Arnaldo Carvalho de Melo <acme@redhat.com> To reduce boilerplate, providing a more compact form to iterate over the maps in a map_group. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-gc3go6fmdn30twusg91t2q56@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/tests/map_groups.c | 9 ++++----- tools/perf/util/map_groups.h | 9 ++++----- tools/perf/util/symbol.c | 24 ++++-------------------- tools/perf/util/vdso.c | 4 ++-- 4 files changed, 14 insertions(+), 32 deletions(-) diff --git a/tools/perf/tests/map_groups.c b/tools/perf/tests/map_groups.c index 594fdaca4f71..b52adad55f8d 100644 --- a/tools/perf/tests/map_groups.c +++ b/tools/perf/tests/map_groups.c @@ -18,17 +18,16 @@ static int check_maps(struct map_def *merged, unsigned int size, struct map_grou struct map *map; unsigned int i = 0; - map = map_groups__first(mg); - while (map) { + map_groups__for_each_entry(mg, map) { + if (i > 0) + TEST_ASSERT_VAL("less maps expected", (map && i < size) || (!map && i == size)); + TEST_ASSERT_VAL("wrong map start", map->start == merged[i].start); TEST_ASSERT_VAL("wrong map end", map->end == merged[i].end); TEST_ASSERT_VAL("wrong map name", !strcmp(map->dso->name, merged[i].name)); TEST_ASSERT_VAL("wrong map refcnt", refcount_read(&map->refcnt) == 2); i++; - map = map_groups__next(map); - - TEST_ASSERT_VAL("less maps expected", (map && i < size) || (!map && i == size)); } return TEST_OK; diff --git a/tools/perf/util/map_groups.h b/tools/perf/util/map_groups.h index ce3ade32babd..bfbdbf5a443a 100644 --- a/tools/perf/util/map_groups.h +++ b/tools/perf/util/map_groups.h @@ -89,12 +89,11 @@ static inline struct map *map_groups__find(struct map_groups *mg, u64 addr) return maps__find(&mg->maps, addr); } -struct map *map_groups__first(struct map_groups *mg); +#define map_groups__for_each_entry(mg, map) \ + for (map = maps__first(&mg->maps); map; map = map__next(map)) -static inline struct map *map_groups__next(struct map *map) -{ - return map__next(map); -} +#define map_groups__for_each_entry_safe(mg, map, next) \ + for (map = maps__first(&mg->maps), next = map__next(map); map; map = next, next = map__next(map)) struct symbol *map_groups__find_symbol(struct map_groups *mg, u64 addr, struct map **mapp); struct symbol *map_groups__find_symbol_by_name(struct map_groups *mg, const char *name, struct map **mapp); diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 042140fc4d36..a4bd61cbc2a0 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -1049,11 +1049,6 @@ int compare_proc_modules(const char *from, const char *to) return ret; } -struct map *map_groups__first(struct map_groups *mg) -{ - return maps__first(&mg->maps); -} - static int do_validate_kcore_modules(const char *filename, struct map_groups *kmaps) { @@ -1065,13 +1060,10 @@ static int do_validate_kcore_modules(const char *filename, if (err) return err; - old_map = map_groups__first(kmaps); - while (old_map) { - struct map *next = map_groups__next(old_map); + map_groups__for_each_entry(kmaps, old_map) { struct module_info *mi; if (!__map__is_kmodule(old_map)) { - old_map = next; continue; } @@ -1081,8 +1073,6 @@ static int do_validate_kcore_modules(const char *filename, err = -EINVAL; goto out; } - - old_map = next; } out: delete_modules(&modules); @@ -1185,9 +1175,7 @@ int map_groups__merge_in(struct map_groups *kmaps, struct map *new_map) struct map *old_map; LIST_HEAD(merged); - for (old_map = map_groups__first(kmaps); old_map; - old_map = map_groups__next(old_map)) { - + map_groups__for_each_entry(kmaps, old_map) { /* no overload with this one */ if (new_map->end < old_map->start || new_map->start >= old_map->end) @@ -1260,7 +1248,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map, { struct map_groups *kmaps = map__kmaps(map); struct kcore_mapfn_data md; - struct map *old_map, *new_map, *replacement_map = NULL; + struct map *old_map, *new_map, *replacement_map = NULL, *next; struct machine *machine; bool is_64_bit; int err, fd; @@ -1307,10 +1295,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map, } /* Remove old maps */ - old_map = map_groups__first(kmaps); - while (old_map) { - struct map *next = map_groups__next(old_map); - + map_groups__for_each_entry_safe(kmaps, old_map, next) { /* * We need to preserve eBPF maps even if they are * covered by kcore, because we need to access @@ -1318,7 +1303,6 @@ static int dso__load_kcore(struct dso *dso, struct map *map, */ if (old_map != map && !__map__is_bpf_prog(old_map)) map_groups__remove(kmaps, old_map); - old_map = next; } machine->trampolines_mapped = false; diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c index ba4b4395f35d..6e00793c10ee 100644 --- a/tools/perf/util/vdso.c +++ b/tools/perf/util/vdso.c @@ -142,9 +142,9 @@ static enum dso_type machine__thread_dso_type(struct machine *machine, struct thread *thread) { enum dso_type dso_type = DSO__TYPE_UNKNOWN; - struct map *map = map_groups__first(thread->mg); + struct map *map; - for (; map ; map = map_groups__next(map)) { + map_groups__for_each_entry(thread->mg, map) { struct dso *dso = map->dso; if (!dso || dso->long_name[0] != '/') continue; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 32/63] libsubcmd: Move EXTRA_FLAGS to the end to allow overriding existing flags 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (30 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 31/63] perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iterators Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 33/63] libsubcmd: Use -O0 with DEBUG=1 Arnaldo Carvalho de Melo ` (30 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, James Clark, James Clark, Adrian Hunter, Ian Rogers, Josh Poimboeuf, nd, Arnaldo Carvalho de Melo From: James Clark <James.Clark@arm.com> Move EXTRA_WARNINGS and EXTRA_FLAGS to the end of the compilation line, otherwise they cannot be used to override the default values. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: nd <nd@arm.com> Link: http://lore.kernel.org/lkml/20191028113340.4282-1-james.clark@arm.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/lib/subcmd/Makefile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/lib/subcmd/Makefile b/tools/lib/subcmd/Makefile index 5b2cd5e58df0..352c6062deba 100644 --- a/tools/lib/subcmd/Makefile +++ b/tools/lib/subcmd/Makefile @@ -19,8 +19,7 @@ MAKEFLAGS += --no-print-directory LIBFILE = $(OUTPUT)libsubcmd.a -CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -fPIC +CFLAGS := -ggdb3 -Wall -Wextra -std=gnu99 -fPIC ifeq ($(DEBUG),0) ifeq ($(feature-fortify-source), 1) @@ -43,6 +42,8 @@ CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE CFLAGS += -I$(srctree)/tools/include/ +CFLAGS += $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) + SUBCMD_IN := $(OUTPUT)libsubcmd-in.o all: -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 33/63] libsubcmd: Use -O0 with DEBUG=1 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (31 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 32/63] libsubcmd: Move EXTRA_FLAGS to the end to allow overriding existing flags Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 34/63] perf tools: Splice events onto evlist even on error Arnaldo Carvalho de Melo ` (29 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, James Clark, James Clark, Adrian Hunter, Ian Rogers, Josh Poimboeuf, nd, Arnaldo Carvalho de Melo From: James Clark <James.Clark@arm.com> When a 'make DEBUG=1' build is done, the command parser is still built with -O6 and is hard to step through, fix it making it use -O0 in that case. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: nd <nd@arm.com> Link: http://lore.kernel.org/lkml/20191028113340.4282-1-james.clark@arm.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/lib/subcmd/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/lib/subcmd/Makefile b/tools/lib/subcmd/Makefile index 352c6062deba..1c777a72bb39 100644 --- a/tools/lib/subcmd/Makefile +++ b/tools/lib/subcmd/Makefile @@ -27,7 +27,9 @@ ifeq ($(DEBUG),0) endif endif -ifeq ($(CC_NO_CLANG), 0) +ifeq ($(DEBUG),1) + CFLAGS += -O0 +else ifeq ($(CC_NO_CLANG), 0) CFLAGS += -O3 else CFLAGS += -O6 -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 34/63] perf tools: Splice events onto evlist even on error 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (32 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 33/63] libsubcmd: Use -O0 with DEBUG=1 Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 36/63] perf vendor events intel: Update all the Intel JSON metrics from TMAM 3.6 Arnaldo Carvalho de Melo ` (28 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> If event parsing fails the event list is leaked, instead splice the list onto the out result and let the caller cleanup. An example input for parse_events found by libFuzzer that reproduces this memory leak is 'm{'. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191025180827.191916-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index db882f630f7e..d36b8129b27a 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1927,15 +1927,20 @@ int parse_events(struct evlist *evlist, const char *str, ret = parse_events__scanner(str, &parse_state, PE_START_EVENTS); perf_pmu__parse_cleanup(); + + if (!ret && list_empty(&parse_state.list)) { + WARN_ONCE(true, "WARNING: event parser found nothing\n"); + return -1; + } + + /* + * Add list to the evlist even with errors to allow callers to clean up. + */ + perf_evlist__splice_list_tail(evlist, &parse_state.list); + if (!ret) { struct evsel *last; - if (list_empty(&parse_state.list)) { - WARN_ONCE(true, "WARNING: event parser found nothing\n"); - return -1; - } - - perf_evlist__splice_list_tail(evlist, &parse_state.list); evlist->nr_groups += parse_state.nr_groups; last = evlist__last(evlist); last->cmdline_group_boundary = true; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 36/63] perf vendor events intel: Update all the Intel JSON metrics from TMAM 3.6. 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (33 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 34/63] perf tools: Splice events onto evlist even on error Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 37/63] perf env: Add perf_env__numa_node() Arnaldo Carvalho de Melo ` (27 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Haiyan Song, Kan Liang, Peter Zijlstra, Alexander Shishkin, Andi Kleen, Jin Yao, Arnaldo Carvalho de Melo From: Haiyan Song <haiyanx.song@intel.com> New Metrics: - DSB_Switches: fraction of cycles CPU was stalled due to switches from DSB to MITE pipeline [all] - L2_Evictions_{Silent|NonSilent}_PKI: L2 {silent|non silent} ecivtions rate per Kilo instruction [SKX+] - IpFarBranch - Instructions per Far Branch Other Enhancements & fixes: - KBLR/CFL & CLX move to separate columns (no column sharing via if #model) - Re-organized/renamed Metric Group Signed-off-by: Haiyan Song <haiyanx.song@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Link: http://lore.kernel.org/lkml/20191030082308.10919-1-haiyanx.song@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- .../arch/x86/broadwell/bdw-metrics.json | 178 +++++++-------- .../arch/x86/broadwellx/bdx-metrics.json | 184 +++++++-------- .../arch/x86/cascadelakex/clx-metrics.json | 210 ++++++++++-------- .../arch/x86/haswell/hsw-metrics.json | 164 +++++++------- .../arch/x86/haswellx/hsx-metrics.json | 170 +++++++------- .../arch/x86/ivybridge/ivb-metrics.json | 170 +++++++------- .../arch/x86/ivytown/ivt-metrics.json | 172 +++++++------- .../arch/x86/jaketown/jkt-metrics.json | 114 +++++----- .../arch/x86/sandybridge/snb-metrics.json | 112 +++++----- .../arch/x86/skylake/skl-metrics.json | 188 ++++++++-------- .../arch/x86/skylakex/skx-metrics.json | 204 +++++++++-------- 11 files changed, 954 insertions(+), 912 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json index 212b117a8ffb..bc7151d639d7 100644 --- a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json +++ b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json @@ -1,352 +1,352 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (12 * ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) ) * (4 * cycles) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts", + "MetricGroup": "BrMispredicts", "MetricName": "Branch_Misprediction_Cost" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) * (12 * ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * (4 * (( ( CPU_CLK_UNHALT ED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts_SMT", + "MetricGroup": "BrMispredicts_SMT", "MetricName": "Branch_Misprediction_Cost_SMT" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( cpu@ITLB_MISSES.WALK_DURATION\\,cmask\\=1@ + cpu@DTLB_LOAD_MISSES.WALK_DURATION\\,cmask\\=1@ + cpu@DTLB_STORE_MISSES.WALK_DURATION\\,cmask\\=1@ + 7 * ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + ITLB_MISSES.WALK_COMPLETED ) ) / cycles", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( cpu@ITLB_MISSES.WALK_DURATION\\,cmask\\=1@ + cpu@DTLB_LOAD_MISSES.WALK_DURATION\\,cmask\\=1@ + cpu@DTLB_STORE_MISSES.WALK_DURATION\\,cmask\\=1@ + 7 * ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + ITLB_MISSES.WALK_COMPLETED ) ) / cycles", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( cpu@ITLB_MISSES.WALK_DURATION\\,cmask\\=1@ + cpu@DTLB_LOAD_MISSES.WALK_DURATION\\,cmask\\=1@ + cpu@DTLB_STORE_MISSES.WALK_DURATION\\,cmask\\=1@ + 7 * ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + ITLB_MISSES.WALK_COMPLETED ) ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( cpu@ITLB_MISSES.WALK_DURATION\\,cmask\\=1@ + cpu@DTLB_LOAD_MISSES.WALK_DURATION\\,cmask\\=1@ + cpu@DTLB_STORE_MISSES.WALK_DURATION\\,cmask\\=1@ + 7 * ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + ITLB_MISSES.WALK_COMPLETED ) ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json index c6f9762f32c0..113d19e92678 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json @@ -1,370 +1,370 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (12 * ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) ) * (4 * cycles) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts", + "MetricGroup": "BrMispredicts", "MetricName": "Branch_Misprediction_Cost" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) * (12 * ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * (4 * (( ( CPU_CLK_UNHALT ED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts_SMT", + "MetricGroup": "BrMispredicts_SMT", "MetricName": "Branch_Misprediction_Cost_SMT" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION + 7 * ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + ITLB_MISSES.WALK_COMPLETED ) ) / ( 2 * cycles )", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION + 7 * ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + ITLB_MISSES.WALK_COMPLETED ) ) / ( 2 * cycles )", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION + 7 * ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + ITLB_MISSES.WALK_COMPLETED ) ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION + 7 * ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + ITLB_MISSES.WALK_COMPLETED ) ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { - "MetricExpr": "1000000000 * ( cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / cbox@event\\=0x35\\,umask\\=0x3\\,filter_opc\\=0x182@ ) / ( cbox_0@event\\=0x0@ / duration_time )", "BriefDescription": "Average latency of data read request to external memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "1000000000 * ( cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / cbox@event\\=0x35\\,umask\\=0x3\\,filter_opc\\=0x182@ ) / ( cbox_0@event\\=0x0@ / duration_time )", "MetricGroup": "Memory_Lat", "MetricName": "DRAM_Read_Latency" }, { - "MetricExpr": "cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182\\,thresh\\=1@", "BriefDescription": "Average number of parallel data read requests to external memory. Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182\\,thresh\\=1@", "MetricGroup": "Memory_BW", "MetricName": "DRAM_Parallel_Reads" }, { - "MetricExpr": "cbox_0@event\\=0x0@", "BriefDescription": "Socket actual clocks when any core is active on that socket", + "MetricExpr": "cbox_0@event\\=0x0@", "MetricGroup": "", "MetricName": "Socket_CLKS" }, { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json index a382b115633d..2ba32af9bc36 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json @@ -1,394 +1,412 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ))", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) ) * (4 * cycles) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts", + "MetricGroup": "BrMispredicts", "MetricName": "Branch_Misprediction_Cost" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts_SMT", + "MetricGroup": "BrMispredicts_SMT", "MetricName": "Branch_Misprediction_Cost_SMT" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Access_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", + "BriefDescription": "Rate of silent evictions from the L2 cache per Kilo instruction where the evicted lines are dropped (no writeback to L3 or memory)", + "MetricExpr": "1000 * L2_LINES_OUT.SILENT / INST_RETIRED.ANY", + "MetricGroup": "", + "MetricName": "L2_Evictions_Silent_PKI" + }, + { + "BriefDescription": "Rate of non silent evictions from the L2 cache per Kilo instruction", + "MetricExpr": "1000 * L2_LINES_OUT.NON_SILENT / INST_RETIRED.ANY", + "MetricGroup": "", + "MetricName": "L2_Evictions_NonSilent_PKI" + }, + { "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { - "MetricExpr": "1000000000 * ( cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ / cha@event\\=0x35\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ ) / ( cha_0@event\\=0x0@ / duration_time )", "BriefDescription": "Average latency of data read request to external memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "1000000000 * ( cha@event\\=0x36\\\\\\,umask\\=0x21@ / cha@event\\=0x35\\\\\\,umask\\=0x21@ ) / ( cha_0@event\\=0x0@ / duration_time )", "MetricGroup": "Memory_Lat", "MetricName": "DRAM_Read_Latency" }, { - "MetricExpr": "cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ / cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1\\\\\\,config\\=0x40433@", "BriefDescription": "Average number of parallel data read requests to external memory. Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "cha@event\\=0x36\\\\\\,umask\\=0x21@ / cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1@", "MetricGroup": "Memory_BW", "MetricName": "DRAM_Parallel_Reads" }, { - "MetricExpr": "( 1000000000 * ( imc@event\\=0xe0\\\\\\,umask\\=0x1@ / imc@event\\=0xe3@ ) / imc_0@event\\=0x0@ ) if 1 if 1 == 1 else 0 else 0", "BriefDescription": "Average latency of data read request to external 3D X-Point memory [in nanoseconds]. Accounts for demand loads and L1/L2 data-read prefetches", + "MetricExpr": "( 1000000000 * ( imc@event\\=0xe0\\\\\\,umask\\=0x1@ / imc@event\\=0xe3@ ) / imc_0@event\\=0x0@ ) if 1 if 0 == 1 else 0 else 0", "MetricGroup": "Memory_Lat", "MetricName": "MEM_PMM_Read_Latency" }, { - "MetricExpr": "( ( 64 * imc@event\\=0xe3@ / 1000000000 ) / duration_time ) if 1 if 1 == 1 else 0 else 0", "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [GB / sec]", + "MetricExpr": "( ( 64 * imc@event\\=0xe3@ / 1000000000 ) / duration_time ) if 1 if 0 == 1 else 0 else 0", "MetricGroup": "Memory_BW", "MetricName": "PMM_Read_BW" }, { - "MetricExpr": "( ( 64 * imc@event\\=0xe7@ / 1000000000 ) / duration_time ) if 1 if 1 == 1 else 0 else 0", "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes [GB / sec]", + "MetricExpr": "( ( 64 * imc@event\\=0xe7@ / 1000000000 ) / duration_time ) if 1 if 0 == 1 else 0 else 0", "MetricGroup": "Memory_BW", "MetricName": "PMM_Write_BW" }, { - "MetricExpr": "cha_0@event\\=0x0@", "BriefDescription": "Socket actual clocks when any core is active on that socket", + "MetricExpr": "cha_0@event\\=0x0@", "MetricGroup": "", "MetricName": "Socket_CLKS" }, { + "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions. )", + "MetricExpr": "INST_RETIRED.ANY / ( BR_INST_RETIRED.FAR_BRANCH / 2 )", + "MetricGroup": "", + "MetricName": "IpFarBranch" + }, + { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json index 21b27488b621..c80f16fde6d0 100644 --- a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json +++ b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json @@ -1,322 +1,322 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else UOPS_EXECUTED.CORE / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else UOPS_EXECUTED.CORE / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / cycles", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / cycles", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json b/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json index e5aac148c941..e501729c3dd1 100644 --- a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json @@ -1,340 +1,340 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else UOPS_EXECUTED.CORE / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else UOPS_EXECUTED.CORE / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / cycles", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / cycles", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { - "MetricExpr": "1000000000 * ( cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / cbox@event\\=0x35\\,umask\\=0x3\\,filter_opc\\=0x182@ ) / ( cbox_0@event\\=0x0@ / duration_time )", "BriefDescription": "Average latency of data read request to external memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "1000000000 * ( cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / cbox@event\\=0x35\\,umask\\=0x3\\,filter_opc\\=0x182@ ) / ( cbox_0@event\\=0x0@ / duration_time )", "MetricGroup": "Memory_Lat", "MetricName": "DRAM_Read_Latency" }, { - "MetricExpr": "cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182\\,thresh\\=1@", "BriefDescription": "Average number of parallel data read requests to external memory. Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / cbox@event\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182\\,thresh\\=1@", "MetricGroup": "Memory_BW", "MetricName": "DRAM_Parallel_Reads" }, { - "MetricExpr": "cbox_0@event\\=0x0@", "BriefDescription": "Socket actual clocks when any core is active on that socket", + "MetricExpr": "cbox_0@event\\=0x0@", "MetricGroup": "", "MetricName": "Socket_CLKS" }, { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json index bc4d5fc284a0..e2446966b651 100644 --- a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json +++ b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json @@ -1,340 +1,340 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + ICACHE.MISSES ) / 4 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + ICACHE.MISSES ) / 4 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / cycles", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / cycles", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.LLC_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.LLC_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json b/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json index f3874b5f9995..9294769dec64 100644 --- a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json +++ b/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json @@ -1,346 +1,346 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + ICACHE.MISSES ) / 4 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + ICACHE.MISSES ) / 4 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / cycles", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / cycles", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_DURATION + DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.LLC_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.LLC_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { - "MetricExpr": "cbox_0@event\\=0x0@", "BriefDescription": "Socket actual clocks when any core is active on that socket", + "MetricExpr": "cbox_0@event\\=0x0@", "MetricGroup": "", "MetricName": "Socket_CLKS" }, { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json b/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json index 98c73e430b05..603ff9c2e9a1 100644 --- a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json +++ b/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json @@ -1,232 +1,232 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + ICACHE.MISSES ) / 4 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + ICACHE.MISSES ) / 4 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_DISPATCHED.THREAD / (( cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_DISPATCHED.THREAD / (( cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { - "MetricExpr": "cbox_0@event\\=0x0@", "BriefDescription": "Socket actual clocks when any core is active on that socket", + "MetricExpr": "cbox_0@event\\=0x0@", "MetricGroup": "", "MetricName": "Socket_CLKS" }, { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json index cfeba5067bab..c6b485b3a2cb 100644 --- a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json @@ -1,226 +1,226 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + ICACHE.MISSES ) / 4 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + ICACHE.MISSES ) / 4 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_DISPATCHED.THREAD / (( cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_DISPATCHED.THREAD / (( cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json index 2c95417a4dae..0ca539bb60f6 100644 --- a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json @@ -1,364 +1,370 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ))", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) ) * (4 * cycles) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts", + "MetricGroup": "BrMispredicts", "MetricName": "Branch_Misprediction_Cost" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts_SMT", + "MetricGroup": "BrMispredicts_SMT", "MetricName": "Branch_Misprediction_Cost_SMT" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Access_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { - "MetricExpr": "arb@event\\=0x80\\,umask\\=0x2@ / arb@event\\=0x80\\,umask\\=0x2\\,thresh\\=1@", "BriefDescription": "Average number of parallel data read requests to external memory. Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "arb@event\\=0x80\\,umask\\=0x2@ / arb@event\\=0x80\\,umask\\=0x2\\,thresh\\=1@", "MetricGroup": "Memory_BW", "MetricName": "DRAM_Parallel_Reads" }, { + "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions. )", + "MetricExpr": "INST_RETIRED.ANY / ( BR_INST_RETIRED.FAR_BRANCH / 2 )", + "MetricGroup": "", + "MetricName": "IpFarBranch" + }, + { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] diff --git a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json index 35b255fa6a79..047d7e11aa6f 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json @@ -1,376 +1,394 @@ [ { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Frontend_Bound" + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound." }, { - "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Frontend_Bound_SMT" + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-ops (uops). Ideally the Frontend can issue 4 uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. SMT version; u se when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Bad_Speculation" + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example." }, { - "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Bad_Speculation_SMT" + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )", "MetricGroup": "TopdownL1", - "MetricName": "Backend_Bound" + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound." }, { - "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", - "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Backend_Bound_SMT" + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. SMT version; use when SMT is enabled and measuring per logical CPU." }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. ", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)", "MetricGroup": "TopdownL1", - "MetricName": "Retiring" + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. " }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", - "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU.", "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. SMT version; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", "MetricGroup": "TopdownL1_SMT", - "MetricName": "Retiring_SMT" + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum 4 uops retired per cycle has been achieved. Maximizing Retiring typically increases the Instruction-Per-Cycle metric. Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Microcode assists are categorized under Retiring. They hurt performance and can often be avoided. SMT version; use when SMT is enabled and measuring per logical CPU." }, { + "BriefDescription": "Instructions Per Cycle (per Logical Processor)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Instructions Per Cycle (per logical thread)", "MetricGroup": "TopDownL1", "MetricName": "IPC" }, { - "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", "BriefDescription": "Uops Per Instruction", - "MetricGroup": "Pipeline;Retiring", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", + "MetricGroup": "Pipeline;Retire", "MetricName": "UPI" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Instruction per taken branch", - "MetricGroup": "Branches;PGO", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fetch_BW;PGO", "MetricName": "IpTB" }, { - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "BriefDescription": "Branch instructions per taken branch. ", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;PGO", "MetricName": "BpTB" }, { - "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )", "BriefDescription": "Rough Estimation of fraction of fetched lines bytes that were likely (includes speculatively fetches) consumed by program instructions", - "MetricGroup": "PGO", + "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )", + "MetricGroup": "PGO;IcMiss", "MetricName": "IFetch_Line_Utilization" }, { - "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ))", "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)", - "MetricGroup": "DSB;Frontend_Bandwidth", + "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)", + "MetricGroup": "DSB;Fetch_BW", "MetricName": "DSB_Coverage" }, { + "BriefDescription": "Cycles Per Instruction (per Logical Processor)", "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)", - "BriefDescription": "Cycles Per Instruction (threaded)", "MetricGroup": "Pipeline;Summary", "MetricName": "CPI" }, { + "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.", "MetricExpr": "CPU_CLK_UNHALTED.THREAD", - "BriefDescription": "Per-thread actual clocks when the logical processor is active.", "MetricGroup": "Summary", "MetricName": "CLKS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * cycles", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1", "MetricName": "SLOTS" }, { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core)", "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", - "BriefDescription": "Total issue-pipeline slots (per core)", "MetricGroup": "TopDownL1_SMT", "MetricName": "SLOTS_SMT" }, { + "BriefDescription": "Instructions per Load (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS", - "BriefDescription": "Instructions per Load (lower number means loads are more frequent)", - "MetricGroup": "Instruction_Type;L1_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpL" }, { + "BriefDescription": "Instructions per Store (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES", - "BriefDescription": "Instructions per Store", - "MetricGroup": "Instruction_Type;Store_Bound", + "MetricGroup": "Instruction_Type", "MetricName": "IpS" }, { + "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "BriefDescription": "Instructions per Branch", - "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6", + "MetricGroup": "Branches;Instruction_Type", "MetricName": "IpB" }, { + "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "BriefDescription": "Instruction per (near) call", "MetricGroup": "Branches", "MetricName": "IpCall" }, { - "MetricExpr": "INST_RETIRED.ANY", "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary", "MetricName": "Instructions" }, { - "MetricExpr": "INST_RETIRED.ANY / cycles", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / cycles", "MetricGroup": "SMT", "MetricName": "CoreIPC" }, { - "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Instructions Per Cycle (per physical core)", + "MetricExpr": "INST_RETIRED.ANY / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "SMT", "MetricName": "CoreIPC_SMT" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / cycles", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / cycles", "MetricGroup": "FLOPS", "MetricName": "FLOPc" }, { - "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "BriefDescription": "Floating Point Operations Per Cycle", + "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", "MetricGroup": "FLOPS_SMT", "MetricName": "FLOPc_SMT" }, { - "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)", - "MetricGroup": "Pipeline;Ports_Utilization", + "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", + "MetricGroup": "Pipeline", "MetricName": "ILP" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) ) * (4 * cycles) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts", + "MetricGroup": "BrMispredicts", "MetricName": "Branch_Misprediction_Cost" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per non-speculative branch misprediction (jeclear)", "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / BR_MISP_RETIRED.ALL_BRANCHES", - "BriefDescription": "Branch Misprediction Cost: Fraction of TopDown slots wasted per branch misprediction (jeclear and baclear)", - "MetricGroup": "Branch_Mispredicts_SMT", + "MetricGroup": "BrMispredicts_SMT", "MetricName": "Branch_Misprediction_Cost_SMT" }, { - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)", - "MetricGroup": "Branch_Mispredicts", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "BrMispredicts", "MetricName": "IpMispredict" }, { + "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core", "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "BriefDescription": "Core actual clocks when any thread is active on the physical core", "MetricGroup": "SMT", "MetricName": "CORE_CLKS" }, { - "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )", "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )", "MetricGroup": "Memory_Bound;Memory_Lat", "MetricName": "Load_Miss_Real_Latency" }, { + "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES", - "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-thread)", "MetricGroup": "Memory_Bound;Memory_BW", "MetricName": "MLP" }, { - "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )", "MetricGroup": "TLB", "MetricName": "Page_Walks_Utilization" }, { - "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )", "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )", "MetricGroup": "TLB_SMT", "MetricName": "Page_Walks_Utilization_SMT" }, { - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L1D_Cache_Fill_BW" }, { - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L2_Cache_Fill_BW" }, { - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Fill_BW" }, { - "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time", "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", + "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time", "MetricGroup": "Memory_BW", "MetricName": "L3_Cache_Access_BW" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L1MPKI" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI" }, { - "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2MPKI_All" }, { - "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L2HPKI_All" }, { - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", - "MetricGroup": "Cache_Misses;", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "Cache_Misses", "MetricName": "L3MPKI" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", + "BriefDescription": "Rate of silent evictions from the L2 cache per Kilo instruction where the evicted lines are dropped (no writeback to L3 or memory)", + "MetricExpr": "1000 * L2_LINES_OUT.SILENT / INST_RETIRED.ANY", + "MetricGroup": "", + "MetricName": "L2_Evictions_Silent_PKI" + }, + { + "BriefDescription": "Rate of non silent evictions from the L2 cache per Kilo instruction", + "MetricExpr": "1000 * L2_LINES_OUT.NON_SILENT / INST_RETIRED.ANY", + "MetricGroup": "", + "MetricName": "L2_Evictions_NonSilent_PKI" + }, + { "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "CPU_Utilization" }, { - "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / 1000000000 ) / duration_time", "MetricGroup": "FLOPS;Summary", "MetricName": "GFLOPs" }, { - "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Average Frequency Utilization relative nominal frequency", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Power", "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active", "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", - "BriefDescription": "Fraction of cycles where both hardware threads were active", "MetricGroup": "SMT;Summary", "MetricName": "SMT_2T_Utilization" }, { - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "BriefDescription": "Fraction of cycles spent in Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / CPU_CLK_UNHALTED.REF_TSC", "MetricGroup": "Summary", "MetricName": "Kernel_Utilization" }, { - "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]", + "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time", "MetricGroup": "Memory_BW", "MetricName": "DRAM_BW_Use" }, { - "MetricExpr": "1000000000 * ( cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ / cha@event\\=0x35\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ ) / ( cha_0@event\\=0x0@ / duration_time )", "BriefDescription": "Average latency of data read request to external memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "1000000000 * ( cha@event\\=0x36\\\\\\,umask\\=0x21@ / cha@event\\=0x35\\\\\\,umask\\=0x21@ ) / ( cha_0@event\\=0x0@ / duration_time )", "MetricGroup": "Memory_Lat", "MetricName": "DRAM_Read_Latency" }, { - "MetricExpr": "cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ / cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1\\\\\\,config\\=0x40433@", "BriefDescription": "Average number of parallel data read requests to external memory. Accounts for demand loads and L1/L2 prefetches", + "MetricExpr": "cha@event\\=0x36\\\\\\,umask\\=0x21@ / cha@event\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1@", "MetricGroup": "Memory_BW", "MetricName": "DRAM_Parallel_Reads" }, { - "MetricExpr": "cha_0@event\\=0x0@", "BriefDescription": "Socket actual clocks when any core is active on that socket", + "MetricExpr": "cha_0@event\\=0x0@", "MetricGroup": "", "MetricName": "Socket_CLKS" }, { + "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions. )", + "MetricExpr": "INST_RETIRED.ANY / ( BR_INST_RETIRED.FAR_BRANCH / 2 )", + "MetricGroup": "", + "MetricName": "IpFarBranch" + }, + { + "BriefDescription": "C3 residency percent per core", "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per core", "MetricName": "C3_Core_Residency" }, { + "BriefDescription": "C6 residency percent per core", "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per core", "MetricName": "C6_Core_Residency" }, { + "BriefDescription": "C7 residency percent per core", "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per core", "MetricName": "C7_Core_Residency" }, { + "BriefDescription": "C2 residency percent per package", "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C2 residency percent per package", "MetricName": "C2_Pkg_Residency" }, { + "BriefDescription": "C3 residency percent per package", "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C3 residency percent per package", "MetricName": "C3_Pkg_Residency" }, { + "BriefDescription": "C6 residency percent per package", "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C6 residency percent per package", "MetricName": "C6_Pkg_Residency" }, { + "BriefDescription": "C7 residency percent per package", "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100", "MetricGroup": "Power", - "BriefDescription": "C7 residency percent per package", "MetricName": "C7_Pkg_Residency" } ] -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 37/63] perf env: Add perf_env__numa_node() 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (34 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 36/63] perf vendor events intel: Update all the Intel JSON metrics from TMAM 3.6 Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 38/63] perf stat: Add --per-node agregation support Arnaldo Carvalho de Melo ` (26 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Alexander Shishkin, Alexey Budankov, Andi Kleen, Joe Mario, Kan Liang, Michael Petlan, Peter Zijlstra, Arnaldo Carvalho de Melo From: Jiri Olsa <jolsa@kernel.org> To speed up cpu to node lookup, add perf_env__numa_node(), that creates cpu array on the first lookup, that holds numa nodes for each stored cpu. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20190904073415.723-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/env.c | 40 ++++++++++++++++++++++++++++++++++++++++ tools/perf/util/env.h | 6 ++++++ 2 files changed, 46 insertions(+) diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index 2a91a10ccfcc..6242a9215df7 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -180,6 +180,7 @@ void perf_env__exit(struct perf_env *env) zfree(&env->sibling_threads); zfree(&env->pmu_mappings); zfree(&env->cpu); + zfree(&env->numa_map); for (i = 0; i < env->nr_numa_nodes; i++) perf_cpu_map__put(env->numa_nodes[i].map); @@ -354,3 +355,42 @@ const char *perf_env__arch(struct perf_env *env) return normalize_arch(arch_name); } + + +int perf_env__numa_node(struct perf_env *env, int cpu) +{ + if (!env->nr_numa_map) { + struct numa_node *nn; + int i, nr = 0; + + for (i = 0; i < env->nr_numa_nodes; i++) { + nn = &env->numa_nodes[i]; + nr = max(nr, perf_cpu_map__max(nn->map)); + } + + nr++; + + /* + * We initialize the numa_map array to prepare + * it for missing cpus, which return node -1 + */ + env->numa_map = malloc(nr * sizeof(int)); + if (!env->numa_map) + return -1; + + for (i = 0; i < nr; i++) + env->numa_map[i] = -1; + + env->nr_numa_map = nr; + + for (i = 0; i < env->nr_numa_nodes; i++) { + int tmp, j; + + nn = &env->numa_nodes[i]; + perf_cpu_map__for_each_cpu(j, tmp, nn->map) + env->numa_map[j] = i; + } + } + + return cpu >= 0 && cpu < env->nr_numa_map ? env->numa_map[cpu] : -1; +} diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h index a3059dc1abe5..11d05ae3606a 100644 --- a/tools/perf/util/env.h +++ b/tools/perf/util/env.h @@ -87,6 +87,10 @@ struct perf_env { struct rb_root btfs; u32 btfs_cnt; } bpf_progs; + + /* For fast cpu to numa node lookup via perf_env__numa_node */ + int *numa_map; + int nr_numa_map; }; enum perf_compress_type { @@ -120,4 +124,6 @@ struct bpf_prog_info_node *perf_env__find_bpf_prog_info(struct perf_env *env, __u32 prog_id); void perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node); struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id); + +int perf_env__numa_node(struct perf_env *env, int cpu); #endif /* __PERF_ENV_H */ -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 38/63] perf stat: Add --per-node agregation support 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (35 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 37/63] perf env: Add perf_env__numa_node() Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 39/63] perf tools: Fix cross compile for ARM64 Arnaldo Carvalho de Melo ` (25 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Alexander Shishkin, Alexey Budankov, Andi Kleen, Joe Mario, Kan Liang, Michael Petlan, Peter Zijlstra, Arnaldo Carvalho de Melo From: Jiri Olsa <jolsa@kernel.org> Adding new --per-node option to aggregate counts per NUMA nodes for system-wide mode measurements. You can specify --per-node in live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles 3.003625233 N0 20 6,604,441 cycles 3.003625233 N1 20 1,043,428 cycles 4.005135036 N0 20 6,350,522 cycles 4.005135036 N1 20 681,564 cycles 4.340902364 N0 20 3,403,188 cycles 4.340902364 N1 20 520,705 cycles Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/Documentation/perf-stat.txt | 5 +++ tools/perf/builtin-stat.c | 52 ++++++++++++++++++++++++++ tools/perf/util/cpumap.c | 18 +++++++++ tools/perf/util/cpumap.h | 3 ++ tools/perf/util/stat-display.c | 15 ++++++++ tools/perf/util/stat.c | 1 + tools/perf/util/stat.h | 1 + 7 files changed, 95 insertions(+) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index a9af4e440e80..9431b8066fb4 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -217,6 +217,11 @@ core number and the number of online logical processors on that physical process Aggregate counts per monitored threads, when monitoring threads (-t option) or processes (-p option). +--per-node:: +Aggregate counts per NUMA nodes for system-wide mode measurements. This +is a useful mode to detect imbalance between NUMA nodes. To enable this +mode, use --per-node in addition to -a. (system-wide). + -D msecs:: --delay msecs:: After starting the program, wait msecs before measuring. This is useful to diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index c88d4e118409..5964e808d73d 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -792,6 +792,8 @@ static struct option stat_options[] = { "aggregate counts per physical processor core", AGGR_CORE), OPT_SET_UINT(0, "per-thread", &stat_config.aggr_mode, "aggregate counts per thread", AGGR_THREAD), + OPT_SET_UINT(0, "per-node", &stat_config.aggr_mode, + "aggregate counts per numa node", AGGR_NODE), OPT_UINTEGER('D', "delay", &stat_config.initial_delay, "ms to wait before starting measurement after program start"), OPT_CALLBACK_NOOPT(0, "metric-only", &stat_config.metric_only, NULL, @@ -830,6 +832,12 @@ static int perf_stat__get_core(struct perf_stat_config *config __maybe_unused, return cpu_map__get_core(map, cpu, NULL); } +static int perf_stat__get_node(struct perf_stat_config *config __maybe_unused, + struct perf_cpu_map *map, int cpu) +{ + return cpu_map__get_node(map, cpu, NULL); +} + static int perf_stat__get_aggr(struct perf_stat_config *config, aggr_get_id_t get_id, struct perf_cpu_map *map, int idx) { @@ -864,6 +872,12 @@ static int perf_stat__get_core_cached(struct perf_stat_config *config, return perf_stat__get_aggr(config, perf_stat__get_core, map, idx); } +static int perf_stat__get_node_cached(struct perf_stat_config *config, + struct perf_cpu_map *map, int idx) +{ + return perf_stat__get_aggr(config, perf_stat__get_node, map, idx); +} + static bool term_percore_set(void) { struct evsel *counter; @@ -902,6 +916,13 @@ static int perf_stat_init_aggr_mode(void) } stat_config.aggr_get_id = perf_stat__get_core_cached; break; + case AGGR_NODE: + if (cpu_map__build_node_map(evsel_list->core.cpus, &stat_config.aggr_map)) { + perror("cannot build core map"); + return -1; + } + stat_config.aggr_get_id = perf_stat__get_node_cached; + break; case AGGR_NONE: if (term_percore_set()) { if (cpu_map__build_core_map(evsel_list->core.cpus, @@ -1014,6 +1035,13 @@ static int perf_env__get_core(struct perf_cpu_map *map, int idx, void *data) return core; } +static int perf_env__get_node(struct perf_cpu_map *map, int idx, void *data) +{ + int cpu = perf_env__get_cpu(data, map, idx); + + return perf_env__numa_node(data, cpu); +} + static int perf_env__build_socket_map(struct perf_env *env, struct perf_cpu_map *cpus, struct perf_cpu_map **sockp) { @@ -1032,6 +1060,12 @@ static int perf_env__build_core_map(struct perf_env *env, struct perf_cpu_map *c return cpu_map__build_map(cpus, corep, perf_env__get_core, env); } +static int perf_env__build_node_map(struct perf_env *env, struct perf_cpu_map *cpus, + struct perf_cpu_map **nodep) +{ + return cpu_map__build_map(cpus, nodep, perf_env__get_node, env); +} + static int perf_stat__get_socket_file(struct perf_stat_config *config __maybe_unused, struct perf_cpu_map *map, int idx) { @@ -1049,6 +1083,12 @@ static int perf_stat__get_core_file(struct perf_stat_config *config __maybe_unus return perf_env__get_core(map, idx, &perf_stat.session->header.env); } +static int perf_stat__get_node_file(struct perf_stat_config *config __maybe_unused, + struct perf_cpu_map *map, int idx) +{ + return perf_env__get_node(map, idx, &perf_stat.session->header.env); +} + static int perf_stat_init_aggr_mode_file(struct perf_stat *st) { struct perf_env *env = &st->session->header.env; @@ -1075,6 +1115,13 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st) } stat_config.aggr_get_id = perf_stat__get_core_file; break; + case AGGR_NODE: + if (perf_env__build_node_map(env, evsel_list->core.cpus, &stat_config.aggr_map)) { + perror("cannot build core map"); + return -1; + } + stat_config.aggr_get_id = perf_stat__get_node_file; + break; case AGGR_NONE: case AGGR_GLOBAL: case AGGR_THREAD: @@ -1622,6 +1669,8 @@ static int __cmd_report(int argc, const char **argv) "aggregate counts per processor die", AGGR_DIE), OPT_SET_UINT(0, "per-core", &perf_stat.aggr_mode, "aggregate counts per physical processor core", AGGR_CORE), + OPT_SET_UINT(0, "per-node", &perf_stat.aggr_mode, + "aggregate counts per numa node", AGGR_NODE), OPT_SET_UINT('A', "no-aggr", &perf_stat.aggr_mode, "disable CPU count aggregation", AGGR_NONE), OPT_END() @@ -1896,6 +1945,9 @@ int cmd_stat(int argc, const char **argv) } } + if (stat_config.aggr_mode == AGGR_NODE) + cpu__setup_cpunode_map(); + if (stat_config.times && interval) interval_count = true; else if (stat_config.times && !interval) { diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c index a22c1114e880..983b7388f22b 100644 --- a/tools/perf/util/cpumap.c +++ b/tools/perf/util/cpumap.c @@ -206,6 +206,11 @@ int cpu_map__get_core_id(int cpu) return ret ?: value; } +int cpu_map__get_node_id(int cpu) +{ + return cpu__get_node(cpu); +} + int cpu_map__get_core(struct perf_cpu_map *map, int idx, void *data) { int cpu, s_die; @@ -235,6 +240,14 @@ int cpu_map__get_core(struct perf_cpu_map *map, int idx, void *data) return (s_die << 16) | (cpu & 0xffff); } +int cpu_map__get_node(struct perf_cpu_map *map, int idx, void *data __maybe_unused) +{ + if (idx < 0 || idx >= map->nr) + return -1; + + return cpu_map__get_node_id(map->map[idx]); +} + int cpu_map__build_socket_map(struct perf_cpu_map *cpus, struct perf_cpu_map **sockp) { return cpu_map__build_map(cpus, sockp, cpu_map__get_socket, NULL); @@ -250,6 +263,11 @@ int cpu_map__build_core_map(struct perf_cpu_map *cpus, struct perf_cpu_map **cor return cpu_map__build_map(cpus, corep, cpu_map__get_core, NULL); } +int cpu_map__build_node_map(struct perf_cpu_map *cpus, struct perf_cpu_map **numap) +{ + return cpu_map__build_map(cpus, numap, cpu_map__get_node, NULL); +} + /* setup simple routines to easily access node numbers given a cpu number */ static int get_max_num(char *path, int *max) { diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h index 2553bef1279d..57943f3685f8 100644 --- a/tools/perf/util/cpumap.h +++ b/tools/perf/util/cpumap.h @@ -20,9 +20,12 @@ int cpu_map__get_die_id(int cpu); int cpu_map__get_die(struct perf_cpu_map *map, int idx, void *data); int cpu_map__get_core_id(int cpu); int cpu_map__get_core(struct perf_cpu_map *map, int idx, void *data); +int cpu_map__get_node_id(int cpu); +int cpu_map__get_node(struct perf_cpu_map *map, int idx, void *data); int cpu_map__build_socket_map(struct perf_cpu_map *cpus, struct perf_cpu_map **sockp); int cpu_map__build_die_map(struct perf_cpu_map *cpus, struct perf_cpu_map **diep); int cpu_map__build_core_map(struct perf_cpu_map *cpus, struct perf_cpu_map **corep); +int cpu_map__build_node_map(struct perf_cpu_map *cpus, struct perf_cpu_map **nodep); const struct perf_cpu_map *cpu_map__online(void); /* thread unsafe */ static inline int cpu_map__socket(struct perf_cpu_map *sock, int s) diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index ed3b0ac2f785..bc31fccc0057 100644 --- a/tools/perf/util/stat-display.c +++ b/tools/perf/util/stat-display.c @@ -100,6 +100,15 @@ static void aggr_printout(struct perf_stat_config *config, nr, config->csv_sep); break; + case AGGR_NODE: + fprintf(config->output, "N%*d%s%*d%s", + config->csv_output ? 0 : -5, + id, + config->csv_sep, + config->csv_output ? 0 : 4, + nr, + config->csv_sep); + break; case AGGR_NONE: if (evsel->percore) { fprintf(config->output, "S%d-D%d-C%*d%s", @@ -965,6 +974,11 @@ static void print_interval(struct perf_stat_config *config, if ((num_print_interval == 0 && !config->csv_output) || config->interval_clear) { switch (config->aggr_mode) { + case AGGR_NODE: + fprintf(output, "# time node cpus"); + if (!metric_only) + fprintf(output, " counts %*s events\n", unit_width, "unit"); + break; case AGGR_SOCKET: fprintf(output, "# time socket cpus"); if (!metric_only) @@ -1188,6 +1202,7 @@ perf_evlist__print_counters(struct evlist *evlist, case AGGR_CORE: case AGGR_DIE: case AGGR_SOCKET: + case AGGR_NODE: print_aggr(config, evlist, prefix); break; case AGGR_THREAD: diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index 6822e4ffe224..332cb730785b 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -299,6 +299,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel, case AGGR_CORE: case AGGR_DIE: case AGGR_SOCKET: + case AGGR_NODE: case AGGR_NONE: if (!evsel->snapshot) perf_evsel__compute_deltas(evsel, cpu, thread, count); diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index 081c4a5113c6..bfa9aaf36ce6 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -47,6 +47,7 @@ enum aggr_mode { AGGR_CORE, AGGR_THREAD, AGGR_UNSET, + AGGR_NODE, }; enum { -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 39/63] perf tools: Fix cross compile for ARM64 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (36 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 38/63] perf stat: Add --per-node agregation support Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 40/63] perf inject: Make --strip keep evsels Arnaldo Carvalho de Melo ` (24 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, John Garry, Alexander Shishkin, Jiri Olsa, Mark Rutland, Peter Zijlstra, Will Deacon, linux-arm-kernel, Arnaldo Carvalho de Melo From: John Garry <john.garry@huawei.com> Currently when cross compiling perf tool for ARM64 on my x86 machine I get this error: arch/arm64/util/sym-handling.c:9:10: fatal error: gelf.h: No such file or directory #include <gelf.h> For the build, libelf is reported off: Auto-detecting system features: ... ... libelf: [ OFF ] Indeed, test-libelf is not built successfully: more ./build/feature/test-libelf.make.output test-libelf.c:2:10: fatal error: libelf.h: No such file or directory #include <libelf.h> ^~~~~~~~~~ compilation terminated. I have no such problems natively compiling on ARM64, and I did not previously have this issue for cross compiling. Fix by relocating the gelf.h include. Signed-off-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/1573045254-39833-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/arch/arm64/util/sym-handling.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/arch/arm64/util/sym-handling.c b/tools/perf/arch/arm64/util/sym-handling.c index 5df788985130..8dfa3e5229f1 100644 --- a/tools/perf/arch/arm64/util/sym-handling.c +++ b/tools/perf/arch/arm64/util/sym-handling.c @@ -6,9 +6,10 @@ #include "symbol.h" // for the elf__needs_adjust_symbols() prototype #include <stdbool.h> -#include <gelf.h> #ifdef HAVE_LIBELF_SUPPORT +#include <gelf.h> + bool elf__needs_adjust_symbols(GElf_Ehdr ehdr) { return ehdr.e_type == ET_EXEC || -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 40/63] perf inject: Make --strip keep evsels 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (37 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 39/63] perf tools: Fix cross compile for ARM64 Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 41/63] perf parse: Add parse events handle error Arnaldo Carvalho de Melo ` (23 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Arnaldo Carvalho de Melo From: Adrian Hunter <adrian.hunter@intel.com> create_gcov (refer to the autofdo example in tools/perf/Documentation/intel-pt.txt) now needs the evsels to read the perf.data file. So don't strip them. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191105100057.21465-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-inject.c | 54 ------------------------------------- 1 file changed, 54 deletions(-) diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 372ecb3e2c06..1e5d28311e14 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -578,58 +578,6 @@ static void strip_init(struct perf_inject *inject) evsel->handler = drop_sample; } -static bool has_tracking(struct evsel *evsel) -{ - return evsel->core.attr.mmap || evsel->core.attr.mmap2 || evsel->core.attr.comm || - evsel->core.attr.task; -} - -#define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \ - PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_IDENTIFIER) - -/* - * In order that the perf.data file is parsable, tracking events like MMAP need - * their selected event to exist, except if there is only 1 selected event left - * and it has a compatible sample type. - */ -static bool ok_to_remove(struct evlist *evlist, - struct evsel *evsel_to_remove) -{ - struct evsel *evsel; - int cnt = 0; - bool ok = false; - - if (!has_tracking(evsel_to_remove)) - return true; - - evlist__for_each_entry(evlist, evsel) { - if (evsel->handler != drop_sample) { - cnt += 1; - if ((evsel->core.attr.sample_type & COMPAT_MASK) == - (evsel_to_remove->core.attr.sample_type & COMPAT_MASK)) - ok = true; - } - } - - return ok && cnt == 1; -} - -static void strip_fini(struct perf_inject *inject) -{ - struct evlist *evlist = inject->session->evlist; - struct evsel *evsel, *tmp; - - /* Remove non-synthesized evsels if possible */ - evlist__for_each_entry_safe(evlist, tmp, evsel) { - if (evsel->handler == drop_sample && - ok_to_remove(evlist, evsel)) { - pr_debug("Deleting %s\n", perf_evsel__name(evsel)); - evlist__remove(evlist, evsel); - evsel__delete(evsel); - } - } -} - static int __cmd_inject(struct perf_inject *inject) { int ret = -EINVAL; @@ -729,8 +677,6 @@ static int __cmd_inject(struct perf_inject *inject) evlist__remove(session->evlist, evsel); evsel__delete(evsel); } - if (inject->strip) - strip_fini(inject); } session->header.data_offset = output_data_offset; session->header.data_size = inject->bytes_written; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 41/63] perf parse: Add parse events handle error 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (38 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 40/63] perf inject: Make --strip keep evsels Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 42/63] perf parse: Ensure config and str in terms are unique Arnaldo Carvalho de Melo ` (22 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> Parse event error handling may overwrite one error string with another creating memory leaks. Introduce a helper routine that warns about multiple error messages as well as avoiding the memory leak. A reproduction of this problem can be seen with: perf stat -e c/c/ After this change this produces: WARNING: multiple event parsing errors event syntax error: 'c/c/' \___ unknown term valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.c | 82 +++++++++++++++++++++------------- tools/perf/util/parse-events.h | 2 + tools/perf/util/pmu.c | 30 ++++++++----- 3 files changed, 71 insertions(+), 43 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index d36b8129b27a..03e54a2d8685 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -182,6 +182,20 @@ static int tp_event_has_id(const char *dir_path, struct dirent *evt_dir) #define MAX_EVENT_LENGTH 512 +void parse_events__handle_error(struct parse_events_error *err, int idx, + char *str, char *help) +{ + if (WARN(!str, "WARNING: failed to provide error string\n")) { + free(help); + return; + } + WARN_ONCE(err->str, "WARNING: multiple event parsing errors\n"); + err->idx = idx; + free(err->str); + err->str = str; + free(err->help); + err->help = help; +} struct tracepoint_path *tracepoint_id_to_path(u64 config) { @@ -932,11 +946,11 @@ static int check_type_val(struct parse_events_term *term, return 0; if (err) { - err->idx = term->err_val; - if (type == PARSE_EVENTS__TERM_TYPE_NUM) - err->str = strdup("expected numeric value"); - else - err->str = strdup("expected string value"); + parse_events__handle_error(err, term->err_val, + type == PARSE_EVENTS__TERM_TYPE_NUM + ? strdup("expected numeric value") + : strdup("expected string value"), + NULL); } return -EINVAL; } @@ -972,8 +986,11 @@ static bool config_term_shrinked; static bool config_term_avail(int term_type, struct parse_events_error *err) { + char *err_str; + if (term_type < 0 || term_type >= __PARSE_EVENTS__TERM_TYPE_NR) { - err->str = strdup("Invalid term_type"); + parse_events__handle_error(err, -1, + strdup("Invalid term_type"), NULL); return false; } if (!config_term_shrinked) @@ -992,9 +1009,9 @@ config_term_avail(int term_type, struct parse_events_error *err) return false; /* term_type is validated so indexing is safe */ - if (asprintf(&err->str, "'%s' is not usable in 'perf stat'", - config_term_names[term_type]) < 0) - err->str = NULL; + if (asprintf(&err_str, "'%s' is not usable in 'perf stat'", + config_term_names[term_type]) >= 0) + parse_events__handle_error(err, -1, err_str, NULL); return false; } } @@ -1036,17 +1053,20 @@ do { \ case PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE: CHECK_TYPE_VAL(STR); if (strcmp(term->val.str, "no") && - parse_branch_str(term->val.str, &attr->branch_sample_type)) { - err->str = strdup("invalid branch sample type"); - err->idx = term->err_val; + parse_branch_str(term->val.str, + &attr->branch_sample_type)) { + parse_events__handle_error(err, term->err_val, + strdup("invalid branch sample type"), + NULL); return -EINVAL; } break; case PARSE_EVENTS__TERM_TYPE_TIME: CHECK_TYPE_VAL(NUM); if (term->val.num > 1) { - err->str = strdup("expected 0 or 1"); - err->idx = term->err_val; + parse_events__handle_error(err, term->err_val, + strdup("expected 0 or 1"), + NULL); return -EINVAL; } break; @@ -1080,8 +1100,9 @@ do { \ case PARSE_EVENTS__TERM_TYPE_PERCORE: CHECK_TYPE_VAL(NUM); if ((unsigned int)term->val.num > 1) { - err->str = strdup("expected 0 or 1"); - err->idx = term->err_val; + parse_events__handle_error(err, term->err_val, + strdup("expected 0 or 1"), + NULL); return -EINVAL; } break; @@ -1089,9 +1110,9 @@ do { \ CHECK_TYPE_VAL(NUM); break; default: - err->str = strdup("unknown term"); - err->idx = term->err_term; - err->help = parse_events_formats_error_string(NULL); + parse_events__handle_error(err, term->err_term, + strdup("unknown term"), + parse_events_formats_error_string(NULL)); return -EINVAL; } @@ -1142,9 +1163,9 @@ static int config_term_tracepoint(struct perf_event_attr *attr, return config_term_common(attr, term, err); default: if (err) { - err->idx = term->err_term; - err->str = strdup("unknown term"); - err->help = strdup("valid terms: call-graph,stack-size\n"); + parse_events__handle_error(err, term->err_term, + strdup("unknown term"), + strdup("valid terms: call-graph,stack-size\n")); } return -EINVAL; } @@ -1323,10 +1344,12 @@ int parse_events_add_pmu(struct parse_events_state *parse_state, pmu = perf_pmu__find(name); if (!pmu) { - if (asprintf(&err->str, + char *err_str; + + if (asprintf(&err_str, "Cannot find PMU `%s'. Missing kernel support?", - name) < 0) - err->str = NULL; + name) >= 0) + parse_events__handle_error(err, -1, err_str, NULL); return -EINVAL; } @@ -2802,13 +2825,10 @@ void parse_events__clear_array(struct parse_events_array *a) void parse_events_evlist_error(struct parse_events_state *parse_state, int idx, const char *str) { - struct parse_events_error *err = parse_state->error; - - if (!err) + if (!parse_state->error) return; - err->idx = idx; - err->str = strdup(str); - WARN_ONCE(!err->str, "WARNING: failed to allocate error string"); + + parse_events__handle_error(parse_state->error, idx, strdup(str), NULL); } static void config_terms_list(char *buf, size_t buf_sz) diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index 769e07cddaa2..34f58d24a06a 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -124,6 +124,8 @@ struct parse_events_state { struct list_head *terms; }; +void parse_events__handle_error(struct parse_events_error *err, int idx, + char *str, char *help); void parse_events__shrink_config_terms(void); int parse_events__is_hardcoded_term(struct parse_events_term *term); int parse_events_term__num(struct parse_events_term **term, diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index adbe97e941dd..f9f427d4c313 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -1050,9 +1050,9 @@ static int pmu_config_term(struct list_head *formats, if (err) { char *pmu_term = pmu_formats_string(formats); - err->idx = term->err_term; - err->str = strdup("unknown term"); - err->help = parse_events_formats_error_string(pmu_term); + parse_events__handle_error(err, term->err_term, + strdup("unknown term"), + parse_events_formats_error_string(pmu_term)); free(pmu_term); } return -EINVAL; @@ -1080,8 +1080,9 @@ static int pmu_config_term(struct list_head *formats, if (term->no_value && bitmap_weight(format->bits, PERF_PMU_FORMAT_BITS) > 1) { if (err) { - err->idx = term->err_val; - err->str = strdup("no value assigned for term"); + parse_events__handle_error(err, term->err_val, + strdup("no value assigned for term"), + NULL); } return -EINVAL; } @@ -1094,8 +1095,9 @@ static int pmu_config_term(struct list_head *formats, term->config, term->val.str); } if (err) { - err->idx = term->err_val; - err->str = strdup("expected numeric value"); + parse_events__handle_error(err, term->err_val, + strdup("expected numeric value"), + NULL); } return -EINVAL; } @@ -1108,11 +1110,15 @@ static int pmu_config_term(struct list_head *formats, max_val = pmu_format_max_value(format->bits); if (val > max_val) { if (err) { - err->idx = term->err_val; - if (asprintf(&err->str, - "value too big for format, maximum is %llu", - (unsigned long long)max_val) < 0) - err->str = strdup("value too big for format"); + char *err_str; + + parse_events__handle_error(err, term->err_val, + asprintf(&err_str, + "value too big for format, maximum is %llu", + (unsigned long long)max_val) < 0 + ? strdup("value too big for format") + : err_str, + NULL); return -EINVAL; } /* -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 42/63] perf parse: Ensure config and str in terms are unique 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (39 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 41/63] perf parse: Add parse events handle error Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 43/63] perf parse: Add destructors for parse event terms Arnaldo Carvalho de Melo ` (21 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> Make it easier to release memory associated with parse event terms by duplicating the string for the config name and ensuring the val string is a duplicate. Currently the parser may memory leak terms and this is addressed in a later patch. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.c | 51 ++++++++++++++++++++++++++++------ tools/perf/util/parse-events.y | 4 ++- 2 files changed, 45 insertions(+), 10 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 03e54a2d8685..578288c94d2a 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1412,7 +1412,6 @@ int parse_events_add_pmu(struct parse_events_state *parse_state, int parse_events_multi_pmu_add(struct parse_events_state *parse_state, char *str, struct list_head **listp) { - struct list_head *head; struct parse_events_term *term; struct list_head *list; struct perf_pmu *pmu = NULL; @@ -1429,19 +1428,30 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state, list_for_each_entry(alias, &pmu->aliases, list) { if (!strcasecmp(alias->name, str)) { + struct list_head *head; + char *config; + head = malloc(sizeof(struct list_head)); if (!head) return -1; INIT_LIST_HEAD(head); - if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, - str, 1, false, &str, NULL) < 0) + config = strdup(str); + if (!config) + return -1; + if (parse_events_term__num(&term, + PARSE_EVENTS__TERM_TYPE_USER, + config, 1, false, &config, + NULL) < 0) { + free(list); + free(config); return -1; + } list_add_tail(&term->list, head); if (!parse_events_add_pmu(parse_state, list, pmu->name, head, true, true)) { - pr_debug("%s -> %s/%s/\n", str, + pr_debug("%s -> %s/%s/\n", config, pmu->name, alias->str); ok++; } @@ -1450,8 +1460,10 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state, } } } - if (!ok) + if (!ok) { + free(list); return -1; + } *listp = list; return 0; } @@ -2746,30 +2758,51 @@ int parse_events_term__sym_hw(struct parse_events_term **term, char *config, unsigned idx) { struct event_symbol *sym; + char *str; struct parse_events_term temp = { .type_val = PARSE_EVENTS__TERM_TYPE_STR, .type_term = PARSE_EVENTS__TERM_TYPE_USER, - .config = config ?: (char *) "event", + .config = config, }; + if (!temp.config) { + temp.config = strdup("event"); + if (!temp.config) + return -ENOMEM; + } BUG_ON(idx >= PERF_COUNT_HW_MAX); sym = &event_symbols_hw[idx]; - return new_term(term, &temp, (char *) sym->symbol, 0); + str = strdup(sym->symbol); + if (!str) + return -ENOMEM; + return new_term(term, &temp, str, 0); } int parse_events_term__clone(struct parse_events_term **new, struct parse_events_term *term) { + char *str; struct parse_events_term temp = { .type_val = term->type_val, .type_term = term->type_term, - .config = term->config, + .config = NULL, .err_term = term->err_term, .err_val = term->err_val, }; - return new_term(new, &temp, term->val.str, term->val.num); + if (term->config) { + temp.config = strdup(term->config); + if (!temp.config) + return -ENOMEM; + } + if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) + return new_term(new, &temp, NULL, term->val.num); + + str = strdup(term->val.str); + if (!str) + return -ENOMEM; + return new_term(new, &temp, str, 0); } int parse_events_copy_term_list(struct list_head *old, diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index ffa1a1b63796..545ab7cefc20 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -665,9 +665,11 @@ PE_NAME array '=' PE_VALUE PE_DRV_CFG_TERM { struct parse_events_term *term; + char *config = strdup($1); + ABORT_ON(!config); ABORT_ON(parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_DRV_CFG, - $1, $1, &@1, NULL)); + config, $1, &@1, NULL)); $$ = term; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 43/63] perf parse: Add destructors for parse event terms 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (40 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 42/63] perf parse: Ensure config and str in terms are unique Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 44/63] perf parse: Before yyabort-ing free components Arnaldo Carvalho de Melo ` (20 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> If parsing fails then destructors are ran to clean the up the stack. Rename the head union member to make the term and evlist use cases more distinct, this simplifies matching the correct destructor. Committer notes: Jiri: "Nice did not know about this.. looks like it's been in bison for some time, right?" Ian: "Looks like it wasn't in Bison 1 but in Bison 2, we're at Bison 3 and Bison 2 is > 14 years old: https://web.archive.org/web/20050924004158/http://www.gnu.org/software/bison/manual/html_mono/bison.html#Destructor-Decl" Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.y | 69 +++++++++++++++++++++++----------- 1 file changed, 48 insertions(+), 21 deletions(-) diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 545ab7cefc20..035edfa8d42e 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -12,6 +12,7 @@ #include <stdio.h> #include <linux/compiler.h> #include <linux/types.h> +#include <linux/zalloc.h> #include "pmu.h" #include "evsel.h" #include "parse-events.h" @@ -37,6 +38,25 @@ static struct list_head* alloc_list() return list; } +static void free_list_evsel(struct list_head* list_evsel) +{ + struct evsel *evsel, *tmp; + + list_for_each_entry_safe(evsel, tmp, list_evsel, core.node) { + list_del_init(&evsel->core.node); + perf_evsel__delete(evsel); + } + free(list_evsel); +} + +static void free_term(struct parse_events_term *term) +{ + if (term->type_val == PARSE_EVENTS__TERM_TYPE_STR) + free(term->val.str); + zfree(&term->array.ranges); + free(term); +} + static void inc_group_count(struct list_head *list, struct parse_events_state *parse_state) { @@ -66,6 +86,7 @@ static void inc_group_count(struct list_head *list, %type <num> PE_VALUE_SYM_TOOL %type <num> PE_RAW %type <num> PE_TERM +%type <num> value_sym %type <str> PE_NAME %type <str> PE_BPF_OBJECT %type <str> PE_BPF_SOURCE @@ -76,37 +97,43 @@ static void inc_group_count(struct list_head *list, %type <str> PE_EVENT_NAME %type <str> PE_PMU_EVENT_PRE PE_PMU_EVENT_SUF PE_KERNEL_PMU_EVENT %type <str> PE_DRV_CFG_TERM -%type <num> value_sym -%type <head> event_config -%type <head> opt_event_config -%type <head> opt_pmu_config +%destructor { free ($$); } <str> %type <term> event_term -%type <head> event_pmu -%type <head> event_legacy_symbol -%type <head> event_legacy_cache -%type <head> event_legacy_mem -%type <head> event_legacy_tracepoint +%destructor { free_term ($$); } <term> +%type <list_terms> event_config +%type <list_terms> opt_event_config +%type <list_terms> opt_pmu_config +%destructor { parse_events_terms__delete ($$); } <list_terms> +%type <list_evsel> event_pmu +%type <list_evsel> event_legacy_symbol +%type <list_evsel> event_legacy_cache +%type <list_evsel> event_legacy_mem +%type <list_evsel> event_legacy_tracepoint +%type <list_evsel> event_legacy_numeric +%type <list_evsel> event_legacy_raw +%type <list_evsel> event_bpf_file +%type <list_evsel> event_def +%type <list_evsel> event_mod +%type <list_evsel> event_name +%type <list_evsel> event +%type <list_evsel> events +%type <list_evsel> group_def +%type <list_evsel> group +%type <list_evsel> groups +%destructor { free_list_evsel ($$); } <list_evsel> %type <tracepoint_name> tracepoint_name -%type <head> event_legacy_numeric -%type <head> event_legacy_raw -%type <head> event_bpf_file -%type <head> event_def -%type <head> event_mod -%type <head> event_name -%type <head> event -%type <head> events -%type <head> group_def -%type <head> group -%type <head> groups +%destructor { free ($$.sys); free ($$.event); } <tracepoint_name> %type <array> array %type <array> array_term %type <array> array_terms +%destructor { free ($$.ranges); } <array> %union { char *str; u64 num; - struct list_head *head; + struct list_head *list_evsel; + struct list_head *list_terms; struct parse_events_term *term; struct tracepoint_name { char *sys; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 44/63] perf parse: Before yyabort-ing free components 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (41 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 43/63] perf parse: Add destructors for parse event terms Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 45/63] perf parse: If pmu configuration fails free terms Arnaldo Carvalho de Melo ` (19 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> Yyabort doesn't destruct inputs and so this must be done manually before using yyabort. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.y | 252 ++++++++++++++++++++++++++------- 1 file changed, 197 insertions(+), 55 deletions(-) diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 035edfa8d42e..376b19855470 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -152,6 +152,7 @@ start_events: groups { struct parse_events_state *parse_state = _parse_state; + /* frees $1 */ parse_events_update_lists($1, &parse_state->list); } @@ -161,6 +162,7 @@ groups ',' group struct list_head *list = $1; struct list_head *group = $3; + /* frees $3 */ parse_events_update_lists(group, list); $$ = list; } @@ -170,6 +172,7 @@ groups ',' event struct list_head *list = $1; struct list_head *event = $3; + /* frees $3 */ parse_events_update_lists(event, list); $$ = list; } @@ -182,8 +185,14 @@ group: group_def ':' PE_MODIFIER_EVENT { struct list_head *list = $1; + int err; - ABORT_ON(parse_events__modifier_group(list, $3)); + err = parse_events__modifier_group(list, $3); + free($3); + if (err) { + free_list_evsel(list); + YYABORT; + } $$ = list; } | @@ -196,6 +205,7 @@ PE_NAME '{' events '}' inc_group_count(list, _parse_state); parse_events__set_leader($1, list, _parse_state); + free($1); $$ = list; } | @@ -214,6 +224,7 @@ events ',' event struct list_head *event = $3; struct list_head *list = $1; + /* frees $3 */ parse_events_update_lists(event, list); $$ = list; } @@ -226,13 +237,19 @@ event_mod: event_name PE_MODIFIER_EVENT { struct list_head *list = $1; + int err; /* * Apply modifier on all events added by single event definition * (there could be more events added for multiple tracepoint * definitions via '*?'. */ - ABORT_ON(parse_events__modifier_event(list, $2, false)); + err = parse_events__modifier_event(list, $2, false); + free($2); + if (err) { + free_list_evsel(list); + YYABORT; + } $$ = list; } | @@ -241,8 +258,14 @@ event_name event_name: PE_EVENT_NAME event_def { - ABORT_ON(parse_events_name($2, $1)); + int err; + + err = parse_events_name($2, $1); free($1); + if (err) { + free_list_evsel($2); + YYABORT; + } $$ = $2; } | @@ -262,23 +285,33 @@ PE_NAME opt_pmu_config { struct parse_events_state *parse_state = _parse_state; struct parse_events_error *error = parse_state->error; - struct list_head *list, *orig_terms, *terms; + struct list_head *list = NULL, *orig_terms = NULL, *terms= NULL; + char *pattern = NULL; + +#define CLEANUP_YYABORT \ + do { \ + parse_events_terms__delete($2); \ + parse_events_terms__delete(orig_terms); \ + free($1); \ + free(pattern); \ + YYABORT; \ + } while(0) if (parse_events_copy_term_list($2, &orig_terms)) - YYABORT; + CLEANUP_YYABORT; if (error) error->idx = @1.first_column; list = alloc_list(); - ABORT_ON(!list); + if (!list) + CLEANUP_YYABORT; if (parse_events_add_pmu(_parse_state, list, $1, $2, false, false)) { struct perf_pmu *pmu = NULL; int ok = 0; - char *pattern; if (asprintf(&pattern, "%s*", $1) < 0) - YYABORT; + CLEANUP_YYABORT; while ((pmu = perf_pmu__scan(pmu)) != NULL) { char *name = pmu->name; @@ -287,31 +320,32 @@ PE_NAME opt_pmu_config strncmp($1, "uncore_", 7)) name += 7; if (!fnmatch(pattern, name, 0)) { - if (parse_events_copy_term_list(orig_terms, &terms)) { - free(pattern); - YYABORT; - } + if (parse_events_copy_term_list(orig_terms, &terms)) + CLEANUP_YYABORT; if (!parse_events_add_pmu(_parse_state, list, pmu->name, terms, true, false)) ok++; parse_events_terms__delete(terms); } } - free(pattern); - if (!ok) - YYABORT; + CLEANUP_YYABORT; } parse_events_terms__delete($2); parse_events_terms__delete(orig_terms); + free($1); $$ = list; +#undef CLEANUP_YYABORT } | PE_KERNEL_PMU_EVENT sep_dc { struct list_head *list; + int err; - if (parse_events_multi_pmu_add(_parse_state, $1, &list) < 0) + err = parse_events_multi_pmu_add(_parse_state, $1, &list); + free($1); + if (err < 0) YYABORT; $$ = list; } @@ -322,6 +356,8 @@ PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc char pmu_name[128]; snprintf(&pmu_name, 128, "%s-%s", $1, $3); + free($1); + free($3); if (parse_events_multi_pmu_add(_parse_state, pmu_name, &list) < 0) YYABORT; $$ = list; @@ -338,11 +374,16 @@ value_sym '/' event_config '/' struct list_head *list; int type = $1 >> 16; int config = $1 & 255; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_numeric(_parse_state, list, type, config, $3)); + err = parse_events_add_numeric(_parse_state, list, type, config, $3); parse_events_terms__delete($3); + if (err) { + free_list_evsel(list); + YYABORT; + } $$ = list; } | @@ -374,11 +415,19 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_e struct parse_events_state *parse_state = _parse_state; struct parse_events_error *error = parse_state->error; struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6)); + err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6); parse_events_terms__delete($6); + free($1); + free($3); + free($5); + if (err) { + free_list_evsel(list); + YYABORT; + } $$ = list; } | @@ -387,11 +436,18 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT opt_event_config struct parse_events_state *parse_state = _parse_state; struct parse_events_error *error = parse_state->error; struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4)); + err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4); parse_events_terms__delete($4); + free($1); + free($3); + if (err) { + free_list_evsel(list); + YYABORT; + } $$ = list; } | @@ -400,11 +456,17 @@ PE_NAME_CACHE_TYPE opt_event_config struct parse_events_state *parse_state = _parse_state; struct parse_events_error *error = parse_state->error; struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2)); + err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2); parse_events_terms__delete($2); + free($1); + if (err) { + free_list_evsel(list); + YYABORT; + } $$ = list; } @@ -413,11 +475,17 @@ PE_PREFIX_MEM PE_VALUE '/' PE_VALUE ':' PE_MODIFIER_BP sep_dc { struct parse_events_state *parse_state = _parse_state; struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_breakpoint(list, &parse_state->idx, - (void *) $2, $6, $4)); + err = parse_events_add_breakpoint(list, &parse_state->idx, + (void *) $2, $6, $4); + free($6); + if (err) { + free(list); + YYABORT; + } $$ = list; } | @@ -428,8 +496,11 @@ PE_PREFIX_MEM PE_VALUE '/' PE_VALUE sep_dc list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_breakpoint(list, &parse_state->idx, - (void *) $2, NULL, $4)); + if (parse_events_add_breakpoint(list, &parse_state->idx, + (void *) $2, NULL, $4)) { + free(list); + YYABORT; + } $$ = list; } | @@ -437,11 +508,17 @@ PE_PREFIX_MEM PE_VALUE ':' PE_MODIFIER_BP sep_dc { struct parse_events_state *parse_state = _parse_state; struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_breakpoint(list, &parse_state->idx, - (void *) $2, $4, 0)); + err = parse_events_add_breakpoint(list, &parse_state->idx, + (void *) $2, $4, 0); + free($4); + if (err) { + free(list); + YYABORT; + } $$ = list; } | @@ -452,8 +529,11 @@ PE_PREFIX_MEM PE_VALUE sep_dc list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_breakpoint(list, &parse_state->idx, - (void *) $2, NULL, 0)); + if (parse_events_add_breakpoint(list, &parse_state->idx, + (void *) $2, NULL, 0)) { + free(list); + YYABORT; + } $$ = list; } @@ -463,29 +543,35 @@ tracepoint_name opt_event_config struct parse_events_state *parse_state = _parse_state; struct parse_events_error *error = parse_state->error; struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); if (error) error->idx = @1.first_column; - if (parse_events_add_tracepoint(list, &parse_state->idx, $1.sys, $1.event, - error, $2)) - return -1; + err = parse_events_add_tracepoint(list, &parse_state->idx, $1.sys, $1.event, + error, $2); + parse_events_terms__delete($2); + free($1.sys); + free($1.event); + if (err) { + free(list); + return -1; + } $$ = list; } tracepoint_name: PE_NAME '-' PE_NAME ':' PE_NAME { - char sys_name[128]; struct tracepoint_name tracepoint; - snprintf(&sys_name, 128, "%s-%s", $1, $3); - tracepoint.sys = &sys_name; + ABORT_ON(asprintf(&tracepoint.sys, "%s-%s", $1, $3) < 0); tracepoint.event = $5; - + free($1); + free($3); $$ = tracepoint; } | @@ -500,11 +586,16 @@ event_legacy_numeric: PE_VALUE ':' PE_VALUE opt_event_config { struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_numeric(_parse_state, list, (u32)$1, $3, $4)); + err = parse_events_add_numeric(_parse_state, list, (u32)$1, $3, $4); parse_events_terms__delete($4); + if (err) { + free(list); + YYABORT; + } $$ = list; } @@ -512,11 +603,16 @@ event_legacy_raw: PE_RAW opt_event_config { struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, $1, $2)); + err = parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, $1, $2); parse_events_terms__delete($2); + if (err) { + free(list); + YYABORT; + } $$ = list; } @@ -525,22 +621,33 @@ PE_BPF_OBJECT opt_event_config { struct parse_events_state *parse_state = _parse_state; struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_load_bpf(parse_state, list, $1, false, $2)); + err = parse_events_load_bpf(parse_state, list, $1, false, $2); parse_events_terms__delete($2); + free($1); + if (err) { + free(list); + YYABORT; + } $$ = list; } | PE_BPF_SOURCE opt_event_config { struct list_head *list; + int err; list = alloc_list(); ABORT_ON(!list); - ABORT_ON(parse_events_load_bpf(_parse_state, list, $1, true, $2)); + err = parse_events_load_bpf(_parse_state, list, $1, true, $2); parse_events_terms__delete($2); + if (err) { + free(list); + YYABORT; + } $$ = list; } @@ -573,6 +680,10 @@ opt_pmu_config: start_terms: event_config { struct parse_events_state *parse_state = _parse_state; + if (parse_state->terms) { + parse_events_terms__delete ($1); + YYABORT; + } parse_state->terms = $1; } @@ -582,7 +693,10 @@ event_config ',' event_term struct list_head *head = $1; struct parse_events_term *term = $3; - ABORT_ON(!head); + if (!head) { + free_term(term); + YYABORT; + } list_add_tail(&term->list, head); $$ = $1; } @@ -603,8 +717,12 @@ PE_NAME '=' PE_NAME { struct parse_events_term *term; - ABORT_ON(parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $3, &@1, &@3)); + if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, + $1, $3, &@1, &@3)) { + free($1); + free($3); + YYABORT; + } $$ = term; } | @@ -612,8 +730,11 @@ PE_NAME '=' PE_VALUE { struct parse_events_term *term; - ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $3, false, &@1, &@3)); + if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, + $1, $3, false, &@1, &@3)) { + free($1); + YYABORT; + } $$ = term; } | @@ -622,7 +743,10 @@ PE_NAME '=' PE_VALUE_SYM_HW struct parse_events_term *term; int config = $3 & 255; - ABORT_ON(parse_events_term__sym_hw(&term, $1, config)); + if (parse_events_term__sym_hw(&term, $1, config)) { + free($1); + YYABORT; + } $$ = term; } | @@ -630,8 +754,11 @@ PE_NAME { struct parse_events_term *term; - ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, 1, true, &@1, NULL)); + if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, + $1, 1, true, &@1, NULL)) { + free($1); + YYABORT; + } $$ = term; } | @@ -648,7 +775,10 @@ PE_TERM '=' PE_NAME { struct parse_events_term *term; - ABORT_ON(parse_events_term__str(&term, (int)$1, NULL, $3, &@1, &@3)); + if (parse_events_term__str(&term, (int)$1, NULL, $3, &@1, &@3)) { + free($3); + YYABORT; + } $$ = term; } | @@ -672,9 +802,13 @@ PE_NAME array '=' PE_NAME { struct parse_events_term *term; - ABORT_ON(parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $4, &@1, &@4)); - + if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, + $1, $4, &@1, &@4)) { + free($1); + free($4); + free($2.ranges); + YYABORT; + } term->array = $2; $$ = term; } @@ -683,8 +817,12 @@ PE_NAME array '=' PE_VALUE { struct parse_events_term *term; - ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $4, false, &@1, &@4)); + if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, + $1, $4, false, &@1, &@4)) { + free($1); + free($2.ranges); + YYABORT; + } term->array = $2; $$ = term; } @@ -695,8 +833,12 @@ PE_DRV_CFG_TERM char *config = strdup($1); ABORT_ON(!config); - ABORT_ON(parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_DRV_CFG, - config, $1, &@1, NULL)); + if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_DRV_CFG, + config, $1, &@1, NULL)) { + free($1); + free(config); + YYABORT; + } $$ = term; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 45/63] perf parse: If pmu configuration fails free terms 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (42 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 44/63] perf parse: Before yyabort-ing free components Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 46/63] perf parse: Add a deep delete for parse event terms Arnaldo Carvalho de Melo ` (18 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> Avoid a memory leak when the configuration fails. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 578288c94d2a..a0a80f4e7038 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1388,8 +1388,15 @@ int parse_events_add_pmu(struct parse_events_state *parse_state, if (get_config_terms(head_config, &config_terms)) return -ENOMEM; - if (perf_pmu__config(pmu, &attr, head_config, parse_state->error)) + if (perf_pmu__config(pmu, &attr, head_config, parse_state->error)) { + struct perf_evsel_config_term *pos, *tmp; + + list_for_each_entry_safe(pos, tmp, &config_terms, list) { + list_del_init(&pos->list); + free(pos); + } return -EINVAL; + } evsel = __add_event(list, &parse_state->idx, &attr, get_config_name(head_config), pmu, -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 46/63] perf parse: Add a deep delete for parse event terms 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (43 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 45/63] perf parse: If pmu configuration fails free terms Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 47/63] perf symbols: Remove needless checks for map->groups->machine Arnaldo Carvalho de Melo ` (17 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Adrian Hunter, Alexander Shishkin, Alexei Starovoitov, Andi Kleen, Daniel Borkmann, Jin Yao, John Garry, Kan Liang, Mark Rutland, Martin KaFai Lau, Peter Zijlstra, Song Liu, Stephane Eranian, Yonghong Song, bpf, cl From: Ian Rogers <irogers@google.com> Add a parse_events_term deep delete function so that owned strings and arrays are freed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-events.c | 16 +++++++++++++--- tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.y | 12 ++---------- tools/perf/util/pmu.c | 2 +- 4 files changed, 17 insertions(+), 14 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index a0a80f4e7038..6d18ff9bce49 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -2812,6 +2812,18 @@ int parse_events_term__clone(struct parse_events_term **new, return new_term(new, &temp, str, 0); } +void parse_events_term__delete(struct parse_events_term *term) +{ + if (term->array.nr_ranges) + zfree(&term->array.ranges); + + if (term->type_val != PARSE_EVENTS__TERM_TYPE_NUM) + zfree(&term->val.str); + + zfree(&term->config); + free(term); +} + int parse_events_copy_term_list(struct list_head *old, struct list_head **new) { @@ -2842,10 +2854,8 @@ void parse_events_terms__purge(struct list_head *terms) struct parse_events_term *term, *h; list_for_each_entry_safe(term, h, terms, list) { - if (term->array.nr_ranges) - zfree(&term->array.ranges); list_del_init(&term->list); - free(term); + parse_events_term__delete(term); } } diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index 34f58d24a06a..5ee8ac93840c 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -139,6 +139,7 @@ int parse_events_term__sym_hw(struct parse_events_term **term, char *config, unsigned idx); int parse_events_term__clone(struct parse_events_term **new, struct parse_events_term *term); +void parse_events_term__delete(struct parse_events_term *term); void parse_events_terms__delete(struct list_head *terms); void parse_events_terms__purge(struct list_head *terms); void parse_events__clear_array(struct parse_events_array *a); diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 376b19855470..4cac830015be 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -49,14 +49,6 @@ static void free_list_evsel(struct list_head* list_evsel) free(list_evsel); } -static void free_term(struct parse_events_term *term) -{ - if (term->type_val == PARSE_EVENTS__TERM_TYPE_STR) - free(term->val.str); - zfree(&term->array.ranges); - free(term); -} - static void inc_group_count(struct list_head *list, struct parse_events_state *parse_state) { @@ -99,7 +91,7 @@ static void inc_group_count(struct list_head *list, %type <str> PE_DRV_CFG_TERM %destructor { free ($$); } <str> %type <term> event_term -%destructor { free_term ($$); } <term> +%destructor { parse_events_term__delete ($$); } <term> %type <list_terms> event_config %type <list_terms> opt_event_config %type <list_terms> opt_pmu_config @@ -694,7 +686,7 @@ event_config ',' event_term struct parse_events_term *term = $3; if (!head) { - free_term(term); + parse_events_term__delete(term); YYABORT; } list_add_tail(&term->list, head); diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index f9f427d4c313..db1e57113f4b 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -1260,7 +1260,7 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms, info->metric_name = alias->metric_name; list_del_init(&term->list); - free(term); + parse_events_term__delete(term); } /* -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 47/63] perf symbols: Remove needless checks for map->groups->machine 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (44 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 46/63] perf parse: Add a deep delete for parse event terms Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 48/63] perf machine: Add kernel_dso() method Arnaldo Carvalho de Melo ` (16 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter From: Arnaldo Carvalho de Melo <acme@redhat.com> Its sufficient to check if map->groups is NULL before using it to get ->machine value. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-utiepyiv8b1tf8f79ok9d6j8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/symbol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index a4bd61cbc2a0..4ad39cc6368d 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -1617,7 +1617,7 @@ int dso__load(struct dso *dso, struct map *map) goto out; } - if (map->groups && map->groups->machine) + if (map->groups) machine = map->groups->machine; else machine = NULL; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 48/63] perf machine: Add kernel_dso() method 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (45 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 47/63] perf symbols: Remove needless checks for map->groups->machine Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 49/63] perf annotate: Fix heap overflow Arnaldo Carvalho de Melo ` (15 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter From: Arnaldo Carvalho de Melo <acme@redhat.com> To reduce boilerplate in some places. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-9s1bgoxxhlnu037e1nqx0tw3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/machine.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 24d9e284daad..e768ef24633f 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -42,6 +42,11 @@ static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock); +static struct dso *machine__kernel_dso(struct machine *machine) +{ + return machine->vmlinux_map->dso; +} + static void dsos__init(struct dsos *dsos) { INIT_LIST_HEAD(&dsos->head); @@ -861,7 +866,7 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp) { int i; size_t printed = 0; - struct dso *kdso = machine__kernel_map(machine)->dso; + struct dso *kdso = machine__kernel_dso(machine); if (kdso->has_build_id) { char filename[PATH_MAX]; @@ -1543,8 +1548,7 @@ static bool perf_event__is_extra_kernel_mmap(struct machine *machine, static int machine__process_extra_kernel_map(struct machine *machine, union perf_event *event) { - struct map *kernel_map = machine__kernel_map(machine); - struct dso *kernel = kernel_map ? kernel_map->dso : NULL; + struct dso *kernel = machine__kernel_dso(machine); struct extra_kernel_map xm = { .start = event->mmap.start, .end = event->mmap.start + event->mmap.len, -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 49/63] perf annotate: Fix heap overflow 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (46 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 48/63] perf machine: Add kernel_dso() method Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 50/63] perf probe: Return a better scope DIE if there is no best scope Arnaldo Carvalho de Melo ` (14 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Alexander Shishkin, Jin Yao, Mark Rutland, Peter Zijlstra, Song Liu, Stephane Eranian, Arnaldo Carvalho de Melo From: Ian Rogers <irogers@google.com> Fix expand_tabs that copies the source lines '\0' and then appends another '\0' at a potentially out of bounds address. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20191026035644.217548-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/annotate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index ef1866a902c4..bee0fee122f8 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1892,7 +1892,7 @@ static char *expand_tabs(char *line, char **storage, size_t *storage_len) } /* Expand the last region. */ - len = line_len + 1 - src; + len = line_len - src; memcpy(&new_line[dst], &line[src], len); dst += len; new_line[dst] = '\0'; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 50/63] perf probe: Return a better scope DIE if there is no best scope 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (47 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 49/63] perf annotate: Fix heap overflow Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 51/63] perf probe: Skip end-of-sequence and non statement lines Arnaldo Carvalho de Melo ` (13 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Ravi Bangoria, Steven Rostedt, Tom Zanussi From: Masami Hiramatsu <mhiramat@kernel.org> Make find_best_scope() returns innermost DIE at given address if there is no best matched scope DIE. Since Gcc sometimes generates intuitively strange line info which is out of inlined function address range, we need this fixup. Without this, sometimes perf probe failed to probe on a line inside an inlined function: # perf probe -D ksys_open:3 Failed to find scope of probe point. Error: Failed to add events. With this fix, 'perf probe' can probe it: # perf probe -D ksys_open:3 p:probe/ksys_open _text+25707308 p:probe/ksys_open_1 _text+25710596 p:probe/ksys_open_2 _text+25711114 p:probe/ksys_open_3 _text+25711343 p:probe/ksys_open_4 _text+25714058 p:probe/ksys_open_5 _text+2819653 p:probe/ksys_open_6 _text+2819701 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157291300887.19771.14936015360963292236.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/probe-finder.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index 88e17a4f5ac3..582f8c34d93a 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -744,6 +744,16 @@ static int find_best_scope_cb(Dwarf_Die *fn_die, void *data) return 0; } +/* Return innermost DIE */ +static int find_inner_scope_cb(Dwarf_Die *fn_die, void *data) +{ + struct find_scope_param *fsp = data; + + memcpy(fsp->die_mem, fn_die, sizeof(Dwarf_Die)); + fsp->found = true; + return 1; +} + /* Find an appropriate scope fits to given conditions */ static Dwarf_Die *find_best_scope(struct probe_finder *pf, Dwarf_Die *die_mem) { @@ -755,8 +765,13 @@ static Dwarf_Die *find_best_scope(struct probe_finder *pf, Dwarf_Die *die_mem) .die_mem = die_mem, .found = false, }; + int ret; - cu_walk_functions_at(&pf->cu_die, pf->addr, find_best_scope_cb, &fsp); + ret = cu_walk_functions_at(&pf->cu_die, pf->addr, find_best_scope_cb, + &fsp); + if (!ret && !fsp.found) + cu_walk_functions_at(&pf->cu_die, pf->addr, + find_inner_scope_cb, &fsp); return fsp.found ? die_mem : NULL; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 51/63] perf probe: Skip end-of-sequence and non statement lines 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (48 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 50/63] perf probe: Return a better scope DIE if there is no best scope Arnaldo Carvalho de Melo @ 2019-11-07 18:59 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 52/63] perf probe: Filter out instances except for inlined subroutine and subprogram Arnaldo Carvalho de Melo ` (12 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 18:59 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Skip end-of-sequence and non-statement lines while walking through lines list. The "end-of-sequence" line information means: "the current address is that of the first byte after the end of a sequence of target machine instructions." (DWARF version 4 spec 6.2.2) This actually means out of scope and we can not probe on it. On the other hand, the statement lines (is_stmt) means: "the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement." (DWARF version 4 spec 6.2.2) So, non-statement line info also should be skipped. These can reduce unneeded probe points and also avoid an error. E.g. without this patch: # perf probe -a "clear_tasks_mm_cpumask:1" Added new events: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_3 (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_4 (on clear_tasks_mm_cpumask:1) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask_4 -aR sleep 1 # This puts 5 probes on one line, but acutally it's not inlined function. This is because there are many non statement instructions at the function prologue. With this patch: # perf probe -a "clear_tasks_mm_cpumask:1" Added new event: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1 # Now perf-probe skips unneeded addresses. Committer testing: Slightly different results, but similar: Before: # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # # perf probe -a "clear_tasks_mm_cpumask:1" Added new events: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask_2 -aR sleep 1 # After: # perf probe -a "clear_tasks_mm_cpumask:1" Added new event: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1 # perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c) # Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157241936090.32002.12156347518596111660.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dwarf-aux.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index ac82fd937e4b..f31001d13bfb 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -782,6 +782,7 @@ int die_walk_lines(Dwarf_Die *rt_die, line_walk_callback_t callback, void *data) int decl = 0, inl; Dwarf_Die die_mem, *cu_die; size_t nlines, i; + bool flag; /* Get the CU die */ if (dwarf_tag(rt_die) != DW_TAG_compile_unit) { @@ -812,6 +813,12 @@ int die_walk_lines(Dwarf_Die *rt_die, line_walk_callback_t callback, void *data) "Possible error in debuginfo.\n"); continue; } + /* Skip end-of-sequence */ + if (dwarf_lineendsequence(line, &flag) != 0 || flag) + continue; + /* Skip Non statement line-info */ + if (dwarf_linebeginstatement(line, &flag) != 0 || !flag) + continue; /* Filter lines based on address */ if (rt_die != cu_die) { /* -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 52/63] perf probe: Filter out instances except for inlined subroutine and subprogram 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (49 preceding siblings ...) 2019-11-07 18:59 ` [PATCH 51/63] perf probe: Skip end-of-sequence and non statement lines Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 53/63] perf probe: Fix to show calling lines of inlined functions Arnaldo Carvalho de Melo ` (11 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Filter out instances except for inlined_subroutine and subprogram DIE in die_walk_instances() and die_is_func_instance(). This fixes an issue that perf probe sets some probes on calling address instead of a target function itself. When perf probe walks on instances of an abstruct origin (a kind of function prototype of inlined function), die_walk_instances() can also pass a GNU_call_site (a GNU extension for call site) to callback. Since it is not an inlined instance of target function, we have to filter out when searching a probe point. Without this patch, perf probe sets probes on call site address too.This can happen on some function which is marked "inlined", but has actual symbol. (I'm not sure why GCC mark it "inlined"): # perf probe -D vfs_read p:probe/vfs_read _text+2500017 p:probe/vfs_read_1 _text+2499468 p:probe/vfs_read_2 _text+2499563 p:probe/vfs_read_3 _text+2498876 p:probe/vfs_read_4 _text+2498512 p:probe/vfs_read_5 _text+2498627 With this patch: Slightly different results, similar tho: # perf probe -D vfs_read p:probe/vfs_read _text+2498512 Committer testing: # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Before: # perf probe -D vfs_read p:probe/vfs_read _text+3131557 p:probe/vfs_read_1 _text+3130975 p:probe/vfs_read_2 _text+3131047 p:probe/vfs_read_3 _text+3130380 p:probe/vfs_read_4 _text+3130000 # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # After: # perf probe -D vfs_read p:probe/vfs_read _text+3130000 # Fixes: db0d2c6420ee ("perf probe: Search concrete out-of-line instances") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157241937063.32002.11024544873990816590.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dwarf-aux.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index f31001d13bfb..ac1289043204 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -334,18 +334,22 @@ int die_entrypc(Dwarf_Die *dw_die, Dwarf_Addr *addr) * @dw_die: a DIE * * Ensure that this DIE is an instance (which has an entry address). - * This returns true if @dw_die is a function instance. If not, you need to - * call die_walk_instances() to find actual instances. + * This returns true if @dw_die is a function instance. If not, the @dw_die + * must be a prototype. You can use die_walk_instances() to find actual + * instances. **/ bool die_is_func_instance(Dwarf_Die *dw_die) { Dwarf_Addr tmp; Dwarf_Attribute attr_mem; + int tag = dwarf_tag(dw_die); - /* Actually gcc optimizes non-inline as like as inlined */ - return !dwarf_func_inline(dw_die) && - (dwarf_entrypc(dw_die, &tmp) == 0 || - dwarf_attr(dw_die, DW_AT_ranges, &attr_mem) != NULL); + if (tag != DW_TAG_subprogram && + tag != DW_TAG_inlined_subroutine) + return false; + + return dwarf_entrypc(dw_die, &tmp) == 0 || + dwarf_attr(dw_die, DW_AT_ranges, &attr_mem) != NULL; } /** @@ -624,6 +628,9 @@ static int __die_walk_instances_cb(Dwarf_Die *inst, void *data) Dwarf_Die *origin; int tmp; + if (!die_is_func_instance(inst)) + return DIE_FIND_CB_CONTINUE; + attr = dwarf_attr(inst, DW_AT_abstract_origin, &attr_mem); if (attr == NULL) return DIE_FIND_CB_CONTINUE; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 53/63] perf probe: Fix to show calling lines of inlined functions 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (50 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 52/63] perf probe: Filter out instances except for inlined subroutine and subprogram Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 54/63] perf probe: Skip overlapped location on searching variables Arnaldo Carvalho de Melo ` (10 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Fix to show calling lines of inlined functions (where an inline function is called). die_walk_lines() filtered out the lines inside inlined functions based on the address. However this also filtered out the lines which call those inlined functions from the target function. To solve this issue, check the call_file and call_line attributes and do not filter out if it matches to the line information. Without this fix, perf probe -L doesn't show some lines correctly. (don't see the lines after 17) # perf probe -L vfs_read <vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0> 0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos) 1 { 2 ssize_t ret; 4 if (!(file->f_mode & FMODE_READ)) return -EBADF; 6 if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; 8 if (unlikely(!access_ok(buf, count))) return -EFAULT; 11 ret = rw_verify_area(READ, file, pos, count); 12 if (!ret) { 13 if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; 15 ret = __vfs_read(file, buf, count, pos); 16 if (ret > 0) { fsnotify_access(file); add_rchar(current, ret); } With this fix: # perf probe -L vfs_read <vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0> 0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos) 1 { 2 ssize_t ret; 4 if (!(file->f_mode & FMODE_READ)) return -EBADF; 6 if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; 8 if (unlikely(!access_ok(buf, count))) return -EFAULT; 11 ret = rw_verify_area(READ, file, pos, count); 12 if (!ret) { 13 if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; 15 ret = __vfs_read(file, buf, count, pos); 16 if (ret > 0) { 17 fsnotify_access(file); 18 add_rchar(current, ret); } 20 inc_syscr(current); } Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157241937995.32002.17899884017011512577.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/dwarf-aux.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index ac1289043204..5544bfbd0f6c 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -784,7 +784,7 @@ int die_walk_lines(Dwarf_Die *rt_die, line_walk_callback_t callback, void *data) Dwarf_Lines *lines; Dwarf_Line *line; Dwarf_Addr addr; - const char *fname, *decf = NULL; + const char *fname, *decf = NULL, *inf = NULL; int lineno, ret = 0; int decl = 0, inl; Dwarf_Die die_mem, *cu_die; @@ -835,13 +835,21 @@ int die_walk_lines(Dwarf_Die *rt_die, line_walk_callback_t callback, void *data) */ if (!dwarf_haspc(rt_die, addr)) continue; + if (die_find_inlinefunc(rt_die, addr, &die_mem)) { + /* Call-site check */ + inf = die_get_call_file(&die_mem); + if ((inf && !strcmp(inf, decf)) && + die_get_call_lineno(&die_mem) == lineno) + goto found; + dwarf_decl_line(&die_mem, &inl); if (inl != decl || decf != dwarf_decl_file(&die_mem)) continue; } } +found: /* Get source line */ fname = dwarf_linesrc(line, NULL, NULL); -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 54/63] perf probe: Skip overlapped location on searching variables 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (51 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 53/63] perf probe: Fix to show calling lines of inlined functions Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 55/63] perf record: Add support for limit perf output file size Arnaldo Carvalho de Melo ` (9 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Masami Hiramatsu, Arnaldo Carvalho de Melo, Jiri Olsa From: Masami Hiramatsu <mhiramat@kernel.org> Since debuginfo__find_probes() callback function can be called with the location which already passed, the callback function must filter out such overlapped locations. add_probe_trace_event() has already done it by commit 1a375ae7659a ("perf probe: Skip same probe address for a given line"), but add_available_vars() doesn't. Thus perf probe -v shows same address repeatedly as below: # perf probe -V vfs_read:18 Available variables at vfs_read:18 @<vfs_read+217> char* buf loff_t* pos ssize_t ret struct file* file @<vfs_read+217> char* buf loff_t* pos ssize_t ret struct file* file @<vfs_read+226> char* buf loff_t* pos ssize_t ret struct file* file With this fix, perf probe -V shows it correctly: # perf probe -V vfs_read:18 Available variables at vfs_read:18 @<vfs_read+217> char* buf loff_t* pos ssize_t ret struct file* file @<vfs_read+226> char* buf loff_t* pos ssize_t ret struct file* file Fixes: cf6eb489e5c0 ("perf probe: Show accessible local variables") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157241938927.32002.4026859017790562751.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/probe-finder.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index 582f8c34d93a..9ecea45da4ca 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -1428,6 +1428,18 @@ static int collect_variables_cb(Dwarf_Die *die_mem, void *data) return DIE_FIND_CB_END; } +static bool available_var_finder_overlap(struct available_var_finder *af) +{ + int i; + + for (i = 0; i < af->nvls; i++) { + if (af->pf.addr == af->vls[i].point.address) + return true; + } + return false; + +} + /* Add a found vars into available variables list */ static int add_available_vars(Dwarf_Die *sc_die, struct probe_finder *pf) { @@ -1438,6 +1450,14 @@ static int add_available_vars(Dwarf_Die *sc_die, struct probe_finder *pf) Dwarf_Die die_mem; int ret; + /* + * For some reason (e.g. different column assigned to same address), + * this callback can be called with the address which already passed. + * Ignore it first. + */ + if (available_var_finder_overlap(af)) + return 0; + /* Check number of tevs */ if (af->nvls == af->max_vls) { pr_warning("Too many( > %d) probe point found.\n", af->max_vls); -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 55/63] perf record: Add support for limit perf output file size 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (52 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 54/63] perf probe: Skip overlapped location on searching variables Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 56/63] perf tests: Fix out of bounds memory access Arnaldo Carvalho de Melo ` (8 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jiwei Sun, Arnaldo Carvalho de Melo, Adrian Hunter, Alexander Shishkin, Michael Petlan, Peter Zijlstra, Richard Danter From: Jiwei Sun <jiwei.sun@windriver.com> The patch adds a new option to limit the output file size, then based on it, we can create a wrapper of the perf command that uses the option to avoid exhausting the disk space by the unconscious user. In order to make the perf.data parsable, we just limit the sample data size, since the perf.data consists of many headers and sample data and other data, the actual size of the recorded file will bigger than the setting value. Testing it: # ./perf record -a -g --max-size=10M Couldn't synthesize bpf events. [ perf record: perf size limit reached (10249 KB), stopping session ] [ perf record: Woken up 32 times to write data ] [ perf record: Captured and wrote 10.133 MB perf.data (71964 samples) ] # ls -lh perf.data -rw------- 1 root root 11M Oct 22 14:32 perf.data # ./perf record -a -g --max-size=10K [ perf record: perf size limit reached (10 KB), stopping session ] Couldn't synthesize bpf events. [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.546 MB perf.data (69 samples) ] # ls -l perf.data -rw------- 1 root root 1626952 Oct 22 14:36 perf.data Committer notes: Fixed the build in multiple distros by using PRIu64 to print u64 struct members, fixing this: builtin-record.c: In function 'record__write': builtin-record.c:150:5: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=] rec->bytes_written >> 10); ^ CC /tmp/build/pe Signed-off-by: Jiwei Sun <jiwei.sun@windriver.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Danter <richard.danter@windriver.com> Link: http://lore.kernel.org/lkml/20191022080901.3841-1-jiwei.sun@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/Documentation/perf-record.txt | 4 +++ tools/perf/builtin-record.c | 46 +++++++++++++++++++++++- 2 files changed, 49 insertions(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 8a4506113d9f..ebcba1f95513 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -574,6 +574,10 @@ Implies --tail-synthesize. --kcore:: Make a copy of /proc/kcore and place it into a directory with the perf data file. +--max-size=<size>:: +Limit the sample data max size, <size> is expected to be a number with +appended unit character - B/K/M/G + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1] diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index f6664bb08b26..b95c000c1ed9 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -94,8 +94,11 @@ struct record { struct switch_output switch_output; unsigned long long samples; cpu_set_t affinity_mask; + unsigned long output_max_size; /* = 0: unlimited */ }; +static volatile int done; + static volatile int auxtrace_record__snapshot_started; static DEFINE_TRIGGER(auxtrace_snapshot_trigger); static DEFINE_TRIGGER(switch_output_trigger); @@ -123,6 +126,12 @@ static bool switch_output_time(struct record *rec) trigger_is_ready(&switch_output_trigger); } +static bool record__output_max_size_exceeded(struct record *rec) +{ + return rec->output_max_size && + (rec->bytes_written >= rec->output_max_size); +} + static int record__write(struct record *rec, struct mmap *map __maybe_unused, void *bf, size_t size) { @@ -135,6 +144,13 @@ static int record__write(struct record *rec, struct mmap *map __maybe_unused, rec->bytes_written += size; + if (record__output_max_size_exceeded(rec) && !done) { + fprintf(stderr, "[ perf record: perf size limit reached (%" PRIu64 " KB)," + " stopping session ]\n", + rec->bytes_written >> 10); + done = 1; + } + if (switch_output_size(rec)) trigger_hit(&switch_output_trigger); @@ -499,7 +515,6 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size) return record__write(rec, map, bf, size); } -static volatile int done; static volatile int signr = -1; static volatile int child_finished; @@ -1984,6 +1999,33 @@ static int record__parse_affinity(const struct option *opt, const char *str, int return 0; } +static int parse_output_max_size(const struct option *opt, + const char *str, int unset) +{ + unsigned long *s = (unsigned long *)opt->value; + static struct parse_tag tags_size[] = { + { .tag = 'B', .mult = 1 }, + { .tag = 'K', .mult = 1 << 10 }, + { .tag = 'M', .mult = 1 << 20 }, + { .tag = 'G', .mult = 1 << 30 }, + { .tag = 0 }, + }; + unsigned long val; + + if (unset) { + *s = 0; + return 0; + } + + val = parse_tag_value(str, tags_size); + if (val != (unsigned long) -1) { + *s = val; + return 0; + } + + return -1; +} + static int record__parse_mmap_pages(const struct option *opt, const char *str, int unset __maybe_unused) @@ -2311,6 +2353,8 @@ static struct option __record_options[] = { "n", "Compressed records using specified level (default: 1 - fastest compression, 22 - greatest compression)", record__parse_comp_level), #endif + OPT_CALLBACK(0, "max-size", &record.output_max_size, + "size", "Limit the maximum size of the output file", parse_output_max_size), OPT_END() }; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 56/63] perf tests: Fix out of bounds memory access 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (53 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 55/63] perf record: Add support for limit perf output file size Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-12-16 16:07 ` Naresh Kamboju 2019-11-07 19:00 ` [PATCH 57/63] perf diff: Don't use hack to skip column length calculation Arnaldo Carvalho de Melo ` (7 subsequent siblings) 62 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Leo Yan, Alexander Shishkin, Mark Rutland, Naresh Kamboju, Peter Zijlstra, Wang Nan, stable, Arnaldo Carvalho de Melo From: Leo Yan <leo.yan@linaro.org> The test case 'Read backward ring buffer' failed on 32-bit architectures which were found by LKFT perf testing. The test failed on arm32 x15 device, qemu_arm32, qemu_i386, and found intermittent failure on i386; the failure log is as below: 50: Read backward ring buffer : --- start --- test child forked, pid 510 Using CPUID GenuineIntel-6-9E-9 mmap size 1052672B mmap size 8192B Finished reading overwrite ring buffer: rewind free(): invalid next size (fast) test child interrupted ---- end ---- Read backward ring buffer: FAILED! The log hints there have issue for memory usage, thus free() reports error 'invalid next size' and directly exit for the case. Finally, this issue is root caused as out of bounds memory access for the data array 'evsel->id'. The backward ring buffer test invokes do_test() twice. 'evsel->id' is allocated at the first call with the flow: test__backward_ring_buffer() `-> do_test() `-> evlist__mmap() `-> evlist__mmap_ex() `-> perf_evsel__alloc_id() So 'evsel->id' is allocated with one item, and it will be used in function perf_evlist__id_add(): evsel->id[0] = id evsel->ids = 1 At the second call for do_test(), it skips to initialize 'evsel->id' and reuses the array which is allocated in the first call. But 'evsel->ids' contains the stale value. Thus: evsel->id[1] = id -> out of bound access evsel->ids = 2 To fix this issue, we will use evlist__open() and evlist__close() pair functions to prepare and cleanup context for evlist; so 'evsel->id' and 'evsel->ids' can be initialized properly when invoke do_test() and avoid the out of bounds memory access. Fixes: ee74701ed8ad ("perf tests: Add test to check backward ring buffer") Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable@vger.kernel.org # v4.10+ Link: http://lore.kernel.org/lkml/20191107020244.2427-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/tests/backward-ring-buffer.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c index a4cd30c0beb3..15cea518f5ad 100644 --- a/tools/perf/tests/backward-ring-buffer.c +++ b/tools/perf/tests/backward-ring-buffer.c @@ -148,6 +148,15 @@ int test__backward_ring_buffer(struct test *test __maybe_unused, int subtest __m goto out_delete_evlist; } + evlist__close(evlist); + + err = evlist__open(evlist); + if (err < 0) { + pr_debug("perf_evlist__open: %s\n", + str_error_r(errno, sbuf, sizeof(sbuf))); + goto out_delete_evlist; + } + err = do_test(evlist, 1, &sample_count, &comm_count); if (err != TEST_OK) goto out_delete_evlist; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* Re: [PATCH 56/63] perf tests: Fix out of bounds memory access 2019-11-07 19:00 ` [PATCH 56/63] perf tests: Fix out of bounds memory access Arnaldo Carvalho de Melo @ 2019-12-16 16:07 ` Naresh Kamboju 2019-12-16 16:20 ` Greg Kroah-Hartman 0 siblings, 1 reply; 133+ messages in thread From: Naresh Kamboju @ 2019-12-16 16:07 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, Greg Kroah-Hartman, Sasha Levin Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Leo Yan, Alexander Shishkin, Mark Rutland, Peter Zijlstra, Wang Nan, linux- stable, Arnaldo Carvalho de Melo, lkft-triage This patch merged into stable-rc tree and perf build failed on OE for Linaro builds for 5.3, 4.19, 4.14 and 4.9. Please find build error logs here, tests/backward-ring-buffer.c: In function 'test__backward_ring_buffer': tests/backward-ring-buffer.c:147:2: warning: implicit declaration of function 'evlist__close'; did you mean 'perf_evlist__close'? [-Wimplicit-function-declaration] evlist__close(evlist); ^~~~~~~~~~~~~ perf_evlist__close tests/backward-ring-buffer.c:147:2: warning: nested extern declaration of 'evlist__close' [-Wnested-externs] tests/backward-ring-buffer.c:149:8: warning: implicit declaration of function 'evlist__open'; did you mean 'perf_evlist__open'? [-Wimplicit-function-declaration] err = evlist__open(evlist); ^~~~~~~~~~~~ perf_evlist__open tests/backward-ring-buffer.c:149:8: warning: nested extern declaration of 'evlist__open' [-Wnested-externs] perf/1.0-r9/recipe-sysroot/usr/lib/python2.7/config/libpython2.7.a(posixmodule.o): In function `posix_tmpnam': /usr/src/debug/python/2.7.15-r1/Python-2.7.15/Modules/posixmodule.c:7648: warning: the use of `tmpnam_r' is dangerous, better use `mkstemp' perf/1.0-r9/recipe-sysroot/usr/lib/python2.7/config/libpython2.7.a(posixmodule.o): In function `posix_tempnam': /usr/src/debug/python/2.7.15-r1/Python-2.7.15/Modules/posixmodule.c:7595: warning: the use of `tempnam' is dangerous, better use `mkstemp' perf/1.0-r9/perf-1.0/perf-in.o: In function `test__backward_ring_buffer': perf/1.0-r9/perf-1.0/tools/perf/tests/backward-ring-buffer.c:147: undefined reference to `evlist__close' perf/1.0-r9/perf-1.0/tools/perf/tests/backward-ring-buffer.c:149: undefined reference to `evlist__open' Full log can be found at, https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-5.3/DISTRO=lkft,MACHINE=hikey,label=docker-lkft/72/consoleText https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-4.19/DISTRO=lkft,MACHINE=intel-corei7-64,label=docker-lkft/378/consoleText https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-4.14/DISTRO=lkft,MACHINE=intel-corei7-64,label=docker-lkft/675/consoleText https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-4.9/DISTRO=lkft,MACHINE=intel-corei7-64,label=docker-lkft/753/consoleText On Fri, 8 Nov 2019 at 00:38, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > From: Leo Yan <leo.yan@linaro.org> > > The test case 'Read backward ring buffer' failed on 32-bit architectures > which were found by LKFT perf testing. The test failed on arm32 x15 > device, qemu_arm32, qemu_i386, and found intermittent failure on i386; > the failure log is as below: > > 50: Read backward ring buffer : > --- start --- > test child forked, pid 510 > Using CPUID GenuineIntel-6-9E-9 > mmap size 1052672B > mmap size 8192B > Finished reading overwrite ring buffer: rewind > free(): invalid next size (fast) > test child interrupted > ---- end ---- > Read backward ring buffer: FAILED! > > The log hints there have issue for memory usage, thus free() reports > error 'invalid next size' and directly exit for the case. Finally, this > issue is root caused as out of bounds memory access for the data array > 'evsel->id'. > > The backward ring buffer test invokes do_test() twice. 'evsel->id' is > allocated at the first call with the flow: > > test__backward_ring_buffer() > `-> do_test() > `-> evlist__mmap() > `-> evlist__mmap_ex() > `-> perf_evsel__alloc_id() > > So 'evsel->id' is allocated with one item, and it will be used in > function perf_evlist__id_add(): > > evsel->id[0] = id > evsel->ids = 1 > > At the second call for do_test(), it skips to initialize 'evsel->id' > and reuses the array which is allocated in the first call. But > 'evsel->ids' contains the stale value. Thus: > > evsel->id[1] = id -> out of bound access > evsel->ids = 2 > > To fix this issue, we will use evlist__open() and evlist__close() pair > functions to prepare and cleanup context for evlist; so 'evsel->id' and > 'evsel->ids' can be initialized properly when invoke do_test() and avoid > the out of bounds memory access. > > Fixes: ee74701ed8ad ("perf tests: Add test to check backward ring buffer") > Signed-off-by: Leo Yan <leo.yan@linaro.org> > Reviewed-by: Jiri Olsa <jolsa@kernel.org> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Cc: Naresh Kamboju <naresh.kamboju@linaro.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Wang Nan <wangnan0@huawei.com> > Cc: stable@vger.kernel.org # v4.10+ > Link: http://lore.kernel.org/lkml/20191107020244.2427-1-leo.yan@linaro.org > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > --- > tools/perf/tests/backward-ring-buffer.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c > index a4cd30c0beb3..15cea518f5ad 100644 > --- a/tools/perf/tests/backward-ring-buffer.c > +++ b/tools/perf/tests/backward-ring-buffer.c > @@ -148,6 +148,15 @@ int test__backward_ring_buffer(struct test *test __maybe_unused, int subtest __m > goto out_delete_evlist; > } > > + evlist__close(evlist); > + > + err = evlist__open(evlist); > + if (err < 0) { > + pr_debug("perf_evlist__open: %s\n", > + str_error_r(errno, sbuf, sizeof(sbuf))); > + goto out_delete_evlist; > + } > + > err = do_test(evlist, 1, &sample_count, &comm_count); > if (err != TEST_OK) > goto out_delete_evlist; > -- > 2.21.0 > ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [PATCH 56/63] perf tests: Fix out of bounds memory access 2019-12-16 16:07 ` Naresh Kamboju @ 2019-12-16 16:20 ` Greg Kroah-Hartman 0 siblings, 0 replies; 133+ messages in thread From: Greg Kroah-Hartman @ 2019-12-16 16:20 UTC (permalink / raw) To: Naresh Kamboju Cc: Arnaldo Carvalho de Melo, Sasha Levin, Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Leo Yan, Alexander Shishkin, Mark Rutland, Peter Zijlstra, Wang Nan, linux- stable, Arnaldo Carvalho de Melo, lkft-triage On Mon, Dec 16, 2019 at 09:37:02PM +0530, Naresh Kamboju wrote: > This patch merged into stable-rc tree and perf build failed on OE for > Linaro builds for 5.3, 4.19, 4.14 and 4.9. > Please find build error logs here, > > tests/backward-ring-buffer.c: In function 'test__backward_ring_buffer': > tests/backward-ring-buffer.c:147:2: warning: implicit declaration of > function 'evlist__close'; did you mean 'perf_evlist__close'? > [-Wimplicit-function-declaration] > evlist__close(evlist); > ^~~~~~~~~~~~~ > perf_evlist__close > tests/backward-ring-buffer.c:147:2: warning: nested extern declaration > of 'evlist__close' [-Wnested-externs] > tests/backward-ring-buffer.c:149:8: warning: implicit declaration of > function 'evlist__open'; did you mean 'perf_evlist__open'? > [-Wimplicit-function-declaration] > err = evlist__open(evlist); > ^~~~~~~~~~~~ > perf_evlist__open > tests/backward-ring-buffer.c:149:8: warning: nested extern declaration > of 'evlist__open' [-Wnested-externs] > perf/1.0-r9/recipe-sysroot/usr/lib/python2.7/config/libpython2.7.a(posixmodule.o): > In function `posix_tmpnam': > /usr/src/debug/python/2.7.15-r1/Python-2.7.15/Modules/posixmodule.c:7648: > warning: the use of `tmpnam_r' is dangerous, better use `mkstemp' > perf/1.0-r9/recipe-sysroot/usr/lib/python2.7/config/libpython2.7.a(posixmodule.o): > In function `posix_tempnam': > /usr/src/debug/python/2.7.15-r1/Python-2.7.15/Modules/posixmodule.c:7595: > warning: the use of `tempnam' is dangerous, better use `mkstemp' > perf/1.0-r9/perf-1.0/perf-in.o: In function `test__backward_ring_buffer': > perf/1.0-r9/perf-1.0/tools/perf/tests/backward-ring-buffer.c:147: > undefined reference to `evlist__close' > perf/1.0-r9/perf-1.0/tools/perf/tests/backward-ring-buffer.c:149: > undefined reference to `evlist__open' > > Full log can be found at, > https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-5.3/DISTRO=lkft,MACHINE=hikey,label=docker-lkft/72/consoleText > https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-4.19/DISTRO=lkft,MACHINE=intel-corei7-64,label=docker-lkft/378/consoleText > https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-4.14/DISTRO=lkft,MACHINE=intel-corei7-64,label=docker-lkft/675/consoleText > https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-4.9/DISTRO=lkft,MACHINE=intel-corei7-64,label=docker-lkft/753/consoleText > Good catch, thanks, I'll go drop it from all of these queues. greg k-h ^ permalink raw reply [flat|nested] 133+ messages in thread
* [PATCH 57/63] perf diff: Don't use hack to skip column length calculation 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (54 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 56/63] perf tests: Fix out of bounds memory access Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 58/63] perf block: Cleanup and refactor block info functions Arnaldo Carvalho de Melo ` (6 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jin Yao, Alexander Shishkin, Andi Kleen, Jin Yao, Kan Liang, Peter Zijlstra, Arnaldo Carvalho de Melo From: Jin Yao <yao.jin@linux.intel.com> Previously we use a nasty hack to skip the hists__calc_col_len for block since this function is not very suitable for block column length calculation. This patch removes the hack code and add a check at the entry of hists__calc_col_len to skip for block case. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-diff.c | 11 ++--------- tools/perf/util/hist.c | 2 ++ 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index 5281629c27b1..faf99a81ad3e 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -765,13 +765,6 @@ static void block_hists_match(struct hists *hists_base, } } -static int filter_cb(struct hist_entry *he, void *arg __maybe_unused) -{ - /* Skip the calculation of column length in output_resort */ - he->filtered = true; - return 0; -} - static void hists__precompute(struct hists *hists) { struct rb_root_cached *root; @@ -820,8 +813,8 @@ static void hists__precompute(struct hists *hists) if (bh->valid && pair_bh->valid) { block_hists_match(&bh->block_hists, &pair_bh->block_hists); - hists__output_resort_cb(&pair_bh->block_hists, - NULL, filter_cb); + hists__output_resort(&pair_bh->block_hists, + NULL); } break; default: diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 679a1d75090c..daa6eef4fde0 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -80,6 +80,8 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h) int symlen; u16 len; + if (h->block_info) + return; /* * +4 accounts for '[x] ' priv level info * +2 accounts for 0x prefix on raw addresses -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 58/63] perf block: Cleanup and refactor block info functions 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (55 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 57/63] perf diff: Don't use hack to skip column length calculation Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 59/63] perf hist: Count the total cycles of all samples Arnaldo Carvalho de Melo ` (5 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jin Yao, Alexander Shishkin, Andi Kleen, Jin Yao, Kan Liang, Peter Zijlstra, Arnaldo Carvalho de Melo From: Jin Yao <yao.jin@linux.intel.com> We have already implemented some block-info related functions. Now it's time to do some cleanup, refactoring and move the functions and structures to new block-info.h/block-info.c. v4: --- Move code for skipping column length calculation to patch: 'perf diff: Don't use hack to skip column length calculation' v3: --- 1. Rename the patch title 2. Rename from block.h/block.c to block-info.h/block-info.c 3. Move more common part to block-info, such as block_info__process_sym. 4. Remove the nasty hack for skipping calculation of column length Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-diff.c | 107 +++-------------------------- tools/perf/util/Build | 1 + tools/perf/util/block-info.c | 129 +++++++++++++++++++++++++++++++++++ tools/perf/util/block-info.h | 43 ++++++++++++ tools/perf/util/hist.c | 1 + tools/perf/util/symbol.c | 22 ------ tools/perf/util/symbol.h | 24 ------- 7 files changed, 185 insertions(+), 142 deletions(-) create mode 100644 tools/perf/util/block-info.c create mode 100644 tools/perf/util/block-info.h diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index faf99a81ad3e..6728568fe5c4 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -24,6 +24,7 @@ #include "util/annotate.h" #include "util/map.h" #include "util/spark.h" +#include "util/block-info.h" #include <linux/err.h> #include <linux/zalloc.h> #include <subcmd/pager.h> @@ -98,8 +99,6 @@ static s64 compute_wdiff_w2; static const char *cpu_list; static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); -static struct addr_location dummy_al; - enum { COMPUTE_DELTA, COMPUTE_RATIO, @@ -537,41 +536,6 @@ static void hists__baseline_only(struct hists *hists) } } -static int64_t block_cmp(struct perf_hpp_fmt *fmt __maybe_unused, - struct hist_entry *left, struct hist_entry *right) -{ - struct block_info *bi_l = left->block_info; - struct block_info *bi_r = right->block_info; - int cmp; - - if (!bi_l->sym || !bi_r->sym) { - if (!bi_l->sym && !bi_r->sym) - return 0; - else if (!bi_l->sym) - return -1; - else - return 1; - } - - if (bi_l->sym == bi_r->sym) { - if (bi_l->start == bi_r->start) { - if (bi_l->end == bi_r->end) - return 0; - else - return (int64_t)(bi_r->end - bi_l->end); - } else - return (int64_t)(bi_r->start - bi_l->start); - } else { - cmp = strcmp(bi_l->sym->name, bi_r->sym->name); - return cmp; - } - - if (bi_l->sym->start != bi_r->sym->start) - return (int64_t)(bi_r->sym->start - bi_l->sym->start); - - return (int64_t)(bi_r->sym->end - bi_l->sym->end); -} - static int64_t block_cycles_diff_cmp(struct hist_entry *left, struct hist_entry *right) { @@ -600,67 +564,13 @@ static void init_block_hist(struct block_hist *bh) INIT_LIST_HEAD(&bh->block_fmt.list); INIT_LIST_HEAD(&bh->block_fmt.sort_list); - bh->block_fmt.cmp = block_cmp; + bh->block_fmt.cmp = block_info__cmp; bh->block_fmt.sort = block_sort; perf_hpp_list__register_sort_field(&bh->block_list, &bh->block_fmt); bh->valid = true; } -static void init_block_info(struct block_info *bi, struct symbol *sym, - struct cyc_hist *ch, int offset) -{ - bi->sym = sym; - bi->start = ch->start; - bi->end = offset; - bi->cycles = ch->cycles; - bi->cycles_aggr = ch->cycles_aggr; - bi->num = ch->num; - bi->num_aggr = ch->num_aggr; - - memcpy(bi->cycles_spark, ch->cycles_spark, - NUM_SPARKS * sizeof(u64)); -} - -static int process_block_per_sym(struct hist_entry *he) -{ - struct annotation *notes; - struct cyc_hist *ch; - struct block_hist *bh; - - if (!he->ms.map || !he->ms.sym) - return 0; - - notes = symbol__annotation(he->ms.sym); - if (!notes || !notes->src || !notes->src->cycles_hist) - return 0; - - bh = container_of(he, struct block_hist, he); - init_block_hist(bh); - - ch = notes->src->cycles_hist; - for (unsigned int i = 0; i < symbol__size(he->ms.sym); i++) { - if (ch[i].num_aggr) { - struct block_info *bi; - struct hist_entry *he_block; - - bi = block_info__new(); - if (!bi) - return -1; - - init_block_info(bi, he->ms.sym, &ch[i], i); - he_block = hists__add_entry_block(&bh->block_hists, - &dummy_al, bi); - if (!he_block) { - block_info__put(bi); - return -1; - } - } - } - - return 0; -} - static int block_pair_cmp(struct hist_entry *a, struct hist_entry *b) { struct block_info *bi_a = a->block_info; @@ -785,8 +695,11 @@ static void hists__precompute(struct hists *hists) he = rb_entry(next, struct hist_entry, rb_node_in); next = rb_next(&he->rb_node_in); - if (compute == COMPUTE_CYCLES) - process_block_per_sym(he); + if (compute == COMPUTE_CYCLES) { + bh = container_of(he, struct block_hist, he); + init_block_hist(bh); + block_info__process_sym(he, bh, NULL, 0); + } data__for_each_file_new(i, d) { pair = get_pair_data(he, d); @@ -805,10 +718,12 @@ static void hists__precompute(struct hists *hists) compute_wdiff(he, pair); break; case COMPUTE_CYCLES: - process_block_per_sym(pair); - bh = container_of(he, struct block_hist, he); pair_bh = container_of(pair, struct block_hist, he); + init_block_hist(pair_bh); + block_info__process_sym(pair, pair_bh, NULL, 0); + + bh = container_of(he, struct block_hist, he); if (bh->valid && pair_bh->valid) { block_hists_match(&bh->block_hists, diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 39814b1806a6..b8e05a147b2b 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -1,4 +1,5 @@ perf-y += annotate.o +perf-y += block-info.o perf-y += block-range.o perf-y += build-id.o perf-y += cacheline.o diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c new file mode 100644 index 000000000000..b9954a32b8f4 --- /dev/null +++ b/tools/perf/util/block-info.c @@ -0,0 +1,129 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <stdlib.h> +#include <string.h> +#include <linux/zalloc.h> +#include "block-info.h" +#include "sort.h" +#include "annotate.h" +#include "symbol.h" + +struct block_info *block_info__get(struct block_info *bi) +{ + if (bi) + refcount_inc(&bi->refcnt); + return bi; +} + +void block_info__put(struct block_info *bi) +{ + if (bi && refcount_dec_and_test(&bi->refcnt)) + free(bi); +} + +struct block_info *block_info__new(void) +{ + struct block_info *bi = zalloc(sizeof(*bi)); + + if (bi) + refcount_set(&bi->refcnt, 1); + return bi; +} + +int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused, + struct hist_entry *left, struct hist_entry *right) +{ + struct block_info *bi_l = left->block_info; + struct block_info *bi_r = right->block_info; + int cmp; + + if (!bi_l->sym || !bi_r->sym) { + if (!bi_l->sym && !bi_r->sym) + return 0; + else if (!bi_l->sym) + return -1; + else + return 1; + } + + if (bi_l->sym == bi_r->sym) { + if (bi_l->start == bi_r->start) { + if (bi_l->end == bi_r->end) + return 0; + else + return (int64_t)(bi_r->end - bi_l->end); + } else + return (int64_t)(bi_r->start - bi_l->start); + } else { + cmp = strcmp(bi_l->sym->name, bi_r->sym->name); + return cmp; + } + + if (bi_l->sym->start != bi_r->sym->start) + return (int64_t)(bi_r->sym->start - bi_l->sym->start); + + return (int64_t)(bi_r->sym->end - bi_l->sym->end); +} + +static void init_block_info(struct block_info *bi, struct symbol *sym, + struct cyc_hist *ch, int offset, + u64 total_cycles) +{ + bi->sym = sym; + bi->start = ch->start; + bi->end = offset; + bi->cycles = ch->cycles; + bi->cycles_aggr = ch->cycles_aggr; + bi->num = ch->num; + bi->num_aggr = ch->num_aggr; + bi->total_cycles = total_cycles; + + memcpy(bi->cycles_spark, ch->cycles_spark, + NUM_SPARKS * sizeof(u64)); +} + +int block_info__process_sym(struct hist_entry *he, struct block_hist *bh, + u64 *block_cycles_aggr, u64 total_cycles) +{ + struct annotation *notes; + struct cyc_hist *ch; + static struct addr_location al; + u64 cycles = 0; + + if (!he->ms.map || !he->ms.sym) + return 0; + + memset(&al, 0, sizeof(al)); + al.map = he->ms.map; + al.sym = he->ms.sym; + + notes = symbol__annotation(he->ms.sym); + if (!notes || !notes->src || !notes->src->cycles_hist) + return 0; + ch = notes->src->cycles_hist; + for (unsigned int i = 0; i < symbol__size(he->ms.sym); i++) { + if (ch[i].num_aggr) { + struct block_info *bi; + struct hist_entry *he_block; + + bi = block_info__new(); + if (!bi) + return -1; + + init_block_info(bi, he->ms.sym, &ch[i], i, + total_cycles); + cycles += bi->cycles_aggr / bi->num_aggr; + + he_block = hists__add_entry_block(&bh->block_hists, + &al, bi); + if (!he_block) { + block_info__put(bi); + return -1; + } + } + } + + if (block_cycles_aggr) + *block_cycles_aggr += cycles; + + return 0; +} diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h new file mode 100644 index 000000000000..d55dfc2fda6f --- /dev/null +++ b/tools/perf/util/block-info.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PERF_BLOCK_H +#define __PERF_BLOCK_H + +#include <linux/types.h> +#include <linux/refcount.h> +#include "util/hist.h" +#include "util/symbol.h" + +struct block_info { + struct symbol *sym; + u64 start; + u64 end; + u64 cycles; + u64 cycles_aggr; + s64 cycles_spark[NUM_SPARKS]; + u64 total_cycles; + int num; + int num_aggr; + refcount_t refcnt; +}; + +struct block_hist; + +struct block_info *block_info__new(void); +struct block_info *block_info__get(struct block_info *bi); +void block_info__put(struct block_info *bi); + +static inline void __block_info__zput(struct block_info **bi) +{ + block_info__put(*bi); + *bi = NULL; +} + +#define block_info__zput(bi) __block_info__zput(&bi) + +int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused, + struct hist_entry *left, struct hist_entry *right); + +int block_info__process_sym(struct hist_entry *he, struct block_hist *bh, + u64 *block_cycles_aggr, u64 total_cycles); + +#endif /* __PERF_BLOCK_H */ diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index daa6eef4fde0..a7fa061987e4 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -18,6 +18,7 @@ #include "srcline.h" #include "symbol.h" #include "thread.h" +#include "block-info.h" #include "ui/progress.h" #include <errno.h> #include <math.h> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 4ad39cc6368d..2764863212b1 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -2351,25 +2351,3 @@ struct mem_info *mem_info__new(void) refcount_set(&mi->refcnt, 1); return mi; } - -struct block_info *block_info__get(struct block_info *bi) -{ - if (bi) - refcount_inc(&bi->refcnt); - return bi; -} - -void block_info__put(struct block_info *bi) -{ - if (bi && refcount_dec_and_test(&bi->refcnt)) - free(bi); -} - -struct block_info *block_info__new(void) -{ - struct block_info *bi = zalloc(sizeof(*bi)); - - if (bi) - refcount_set(&bi->refcnt, 1); - return bi; -} diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index cc2a89b99d3d..c3bd16d75d5d 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -106,18 +106,6 @@ struct ref_reloc_sym { u64 unrelocated_addr; }; -struct block_info { - struct symbol *sym; - u64 start; - u64 end; - u64 cycles; - u64 cycles_aggr; - s64 cycles_spark[NUM_SPARKS]; - int num; - int num_aggr; - refcount_t refcnt; -}; - struct addr_location { struct machine *machine; struct thread *thread; @@ -291,16 +279,4 @@ static inline void __mem_info__zput(struct mem_info **mi) #define mem_info__zput(mi) __mem_info__zput(&mi) -struct block_info *block_info__new(void); -struct block_info *block_info__get(struct block_info *bi); -void block_info__put(struct block_info *bi); - -static inline void __block_info__zput(struct block_info **bi) -{ - block_info__put(*bi); - *bi = NULL; -} - -#define block_info__zput(bi) __block_info__zput(&bi) - #endif /* __PERF_SYMBOL */ -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 59/63] perf hist: Count the total cycles of all samples 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (56 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 58/63] perf block: Cleanup and refactor block info functions Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 60/63] perf hist: Support block formats with compare/sort/display Arnaldo Carvalho de Melo ` (4 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jin Yao, Alexander Shishkin, Andi Kleen, Jin Yao, Kan Liang, Peter Zijlstra, Arnaldo Carvalho de Melo From: Jin Yao <yao.jin@linux.intel.com> We can get the per sample cycles by hist__account_cycles(). It's also useful to know the total cycles of all samples in order to get the cycles coverage for a single program block in further. For example: coverage = per block sampled cycles / total sampled cycles This patch creates a new argument 'total_cycles' in hist__account_cycles(), which will be added with the cycles of each sample. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-annotate.c | 2 +- tools/perf/builtin-diff.c | 3 ++- tools/perf/builtin-report.c | 2 +- tools/perf/builtin-top.c | 3 ++- tools/perf/util/hist.c | 6 +++++- tools/perf/util/hist.h | 3 ++- 6 files changed, 13 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 8db8fc9bddef..6ab0cc45b287 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -201,7 +201,7 @@ static int process_branch_callback(struct evsel *evsel, if (a.map != NULL) a.map->dso->hit = 1; - hist__account_cycles(sample->branch_stack, al, sample, false); + hist__account_cycles(sample->branch_stack, al, sample, false, NULL); ret = hist_entry_iter__add(&iter, &a, PERF_MAX_STACK_DEPTH, ann); return ret; diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index 6728568fe5c4..376dbf10ad64 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -426,7 +426,8 @@ static int diff__process_sample_event(struct perf_tool *tool, goto out_put; } - hist__account_cycles(sample->branch_stack, &al, sample, false); + hist__account_cycles(sample->branch_stack, &al, sample, false, + NULL); } /* diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 3bbad039abf2..bc15b9dcccd6 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -292,7 +292,7 @@ static int process_sample_event(struct perf_tool *tool, if (ui__has_annotation() || rep->symbol_ipc) { hist__account_cycles(sample->branch_stack, &al, sample, - rep->nonany_branch_mode); + rep->nonany_branch_mode, NULL); } ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep); diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index d96f24c8770d..14c52e4d47f6 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -725,7 +725,8 @@ static int hist_iter__top_callback(struct hist_entry_iter *iter, perf_top__record_precise_ip(top, he, iter->sample, evsel, al->addr); hist__account_cycles(iter->sample->branch_stack, al, iter->sample, - !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY)); + !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY), + NULL); return 0; } diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index a7fa061987e4..0e27d6830011 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -2572,7 +2572,8 @@ int hists__unlink(struct hists *hists) } void hist__account_cycles(struct branch_stack *bs, struct addr_location *al, - struct perf_sample *sample, bool nonany_branch_mode) + struct perf_sample *sample, bool nonany_branch_mode, + u64 *total_cycles) { struct branch_info *bi; @@ -2599,6 +2600,9 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al, nonany_branch_mode ? NULL : prev, bi[i].flags.cycles); prev = &bi[i].to; + + if (total_cycles) + *total_cycles += bi[i].flags.cycles; } free(bi); } diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 6a186b668303..4d87c7b4c1b2 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -527,7 +527,8 @@ unsigned int hists__sort_list_width(struct hists *hists); unsigned int hists__overhead_width(struct hists *hists); void hist__account_cycles(struct branch_stack *bs, struct addr_location *al, - struct perf_sample *sample, bool nonany_branch_mode); + struct perf_sample *sample, bool nonany_branch_mode, + u64 *total_cycles); struct option; int parse_filter_percentage(const struct option *opt, const char *arg, int unset); -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 60/63] perf hist: Support block formats with compare/sort/display 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (57 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 59/63] perf hist: Count the total cycles of all samples Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 61/63] perf report: Sort by sampled cycles percent per block for stdio Arnaldo Carvalho de Melo ` (3 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jin Yao, Alexander Shishkin, Andi Kleen, Jin Yao, Kan Liang, Peter Zijlstra, Arnaldo Carvalho de Melo From: Jin Yao <yao.jin@linux.intel.com> This patch provides helper routines to support new columns for block info output. The new columns are: Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object v5: --- 1. Move more block related functions from builtin-report.c to block-info.c 2. Set ms (map+sym) in block hist_entry. Because this info is needed for reporting the block range (i.e. source line) Committer notes: Remove unused set_fmt() function, some build were not completing with: util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function] static inline void set_fmt(struct block_fmt *block_fmt, ^ 1 error generated. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/block-info.c | 310 +++++++++++++++++++++++++++++++++++ tools/perf/util/block-info.h | 33 +++- tools/perf/util/hist.c | 4 + 3 files changed, 345 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c index b9954a32b8f4..4a7bac95231e 100644 --- a/tools/perf/util/block-info.c +++ b/tools/perf/util/block-info.c @@ -6,6 +6,40 @@ #include "sort.h" #include "annotate.h" #include "symbol.h" +#include "dso.h" +#include "map.h" +#include "srcline.h" +#include "evlist.h" + +static struct block_header_column { + const char *name; + int width; +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = { + [PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT] = { + .name = "Sampled Cycles%", + .width = 15, + }, + [PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = { + .name = "Sampled Cycles", + .width = 14, + }, + [PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = { + .name = "Avg Cycles%", + .width = 11, + }, + [PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = { + .name = "Avg Cycles", + .width = 10, + }, + [PERF_HPP_REPORT__BLOCK_RANGE] = { + .name = "[Program Block Range]", + .width = 70, + }, + [PERF_HPP_REPORT__BLOCK_DSO] = { + .name = "Shared Object", + .width = 20, + } +}; struct block_info *block_info__get(struct block_info *bi) { @@ -127,3 +161,279 @@ int block_info__process_sym(struct hist_entry *he, struct block_hist *bh, return 0; } + +static int block_column_header(struct perf_hpp_fmt *fmt, + struct perf_hpp *hpp, + struct hists *hists __maybe_unused, + int line __maybe_unused, + int *span __maybe_unused) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + + return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, + block_fmt->header); +} + +static int block_column_width(struct perf_hpp_fmt *fmt, + struct perf_hpp *hpp __maybe_unused, + struct hists *hists __maybe_unused) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + + return block_fmt->width; +} + +static int block_total_cycles_pct_entry(struct perf_hpp_fmt *fmt, + struct perf_hpp *hpp, + struct hist_entry *he) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + struct block_info *bi = he->block_info; + double ratio = 0.0; + char buf[16]; + + if (block_fmt->total_cycles) + ratio = (double)bi->cycles / (double)block_fmt->total_cycles; + + sprintf(buf, "%.2f%%", 100.0 * ratio); + + return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf); +} + +static int64_t block_total_cycles_pct_sort(struct perf_hpp_fmt *fmt, + struct hist_entry *left, + struct hist_entry *right) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + struct block_info *bi_l = left->block_info; + struct block_info *bi_r = right->block_info; + double l, r; + + if (block_fmt->total_cycles) { + l = ((double)bi_l->cycles / + (double)block_fmt->total_cycles) * 100000.0; + r = ((double)bi_r->cycles / + (double)block_fmt->total_cycles) * 100000.0; + return (int64_t)l - (int64_t)r; + } + + return 0; +} + +static void cycles_string(u64 cycles, char *buf, int size) +{ + if (cycles >= 1000000) + scnprintf(buf, size, "%.1fM", (double)cycles / 1000000.0); + else if (cycles >= 1000) + scnprintf(buf, size, "%.1fK", (double)cycles / 1000.0); + else + scnprintf(buf, size, "%1d", cycles); +} + +static int block_cycles_lbr_entry(struct perf_hpp_fmt *fmt, + struct perf_hpp *hpp, struct hist_entry *he) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + struct block_info *bi = he->block_info; + char cycles_buf[16]; + + cycles_string(bi->cycles_aggr, cycles_buf, sizeof(cycles_buf)); + + return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, + cycles_buf); +} + +static int block_cycles_pct_entry(struct perf_hpp_fmt *fmt, + struct perf_hpp *hpp, struct hist_entry *he) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + struct block_info *bi = he->block_info; + double ratio = 0.0; + u64 avg; + char buf[16]; + + if (block_fmt->block_cycles && bi->num_aggr) { + avg = bi->cycles_aggr / bi->num_aggr; + ratio = (double)avg / (double)block_fmt->block_cycles; + } + + sprintf(buf, "%.2f%%", 100.0 * ratio); + + return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf); +} + +static int block_avg_cycles_entry(struct perf_hpp_fmt *fmt, + struct perf_hpp *hpp, + struct hist_entry *he) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + struct block_info *bi = he->block_info; + char cycles_buf[16]; + + cycles_string(bi->cycles_aggr / bi->num_aggr, cycles_buf, + sizeof(cycles_buf)); + + return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, + cycles_buf); +} + +static int block_range_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp, + struct hist_entry *he) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + struct block_info *bi = he->block_info; + char buf[128]; + char *start_line, *end_line; + + symbol_conf.disable_add2line_warn = true; + + start_line = map__srcline(he->ms.map, bi->sym->start + bi->start, + he->ms.sym); + + end_line = map__srcline(he->ms.map, bi->sym->start + bi->end, + he->ms.sym); + + if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { + scnprintf(buf, sizeof(buf), "[%s -> %s]", + start_line, end_line); + } else { + scnprintf(buf, sizeof(buf), "[%7lx -> %7lx]", + bi->start, bi->end); + } + + free_srcline(start_line); + free_srcline(end_line); + + return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf); +} + +static int block_dso_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp, + struct hist_entry *he) +{ + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt); + struct map *map = he->ms.map; + + if (map && map->dso) { + return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, + map->dso->short_name); + } + + return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, + "[unknown]"); +} + +static void init_block_header(struct block_fmt *block_fmt) +{ + struct perf_hpp_fmt *fmt = &block_fmt->fmt; + + BUG_ON(block_fmt->idx >= PERF_HPP_REPORT__BLOCK_MAX_INDEX); + + block_fmt->header = block_columns[block_fmt->idx].name; + block_fmt->width = block_columns[block_fmt->idx].width; + + fmt->header = block_column_header; + fmt->width = block_column_width; +} + +static void hpp_register(struct block_fmt *block_fmt, int idx, + struct perf_hpp_list *hpp_list) +{ + struct perf_hpp_fmt *fmt = &block_fmt->fmt; + + block_fmt->idx = idx; + INIT_LIST_HEAD(&fmt->list); + INIT_LIST_HEAD(&fmt->sort_list); + + switch (idx) { + case PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT: + fmt->entry = block_total_cycles_pct_entry; + fmt->cmp = block_info__cmp; + fmt->sort = block_total_cycles_pct_sort; + break; + case PERF_HPP_REPORT__BLOCK_LBR_CYCLES: + fmt->entry = block_cycles_lbr_entry; + break; + case PERF_HPP_REPORT__BLOCK_CYCLES_PCT: + fmt->entry = block_cycles_pct_entry; + break; + case PERF_HPP_REPORT__BLOCK_AVG_CYCLES: + fmt->entry = block_avg_cycles_entry; + break; + case PERF_HPP_REPORT__BLOCK_RANGE: + fmt->entry = block_range_entry; + break; + case PERF_HPP_REPORT__BLOCK_DSO: + fmt->entry = block_dso_entry; + break; + default: + return; + } + + init_block_header(block_fmt); + perf_hpp_list__column_register(hpp_list, fmt); +} + +static void register_block_columns(struct perf_hpp_list *hpp_list, + struct block_fmt *block_fmts) +{ + for (int i = 0; i < PERF_HPP_REPORT__BLOCK_MAX_INDEX; i++) + hpp_register(&block_fmts[i], i, hpp_list); +} + +static void init_block_hist(struct block_hist *bh, struct block_fmt *block_fmts) +{ + __hists__init(&bh->block_hists, &bh->block_list); + perf_hpp_list__init(&bh->block_list); + bh->block_list.nr_header_lines = 1; + + register_block_columns(&bh->block_list, block_fmts); + + perf_hpp_list__register_sort_field(&bh->block_list, + &block_fmts[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT].fmt); +} + +static void process_block_report(struct hists *hists, + struct block_report *block_report, + u64 total_cycles) +{ + struct rb_node *next = rb_first_cached(&hists->entries); + struct block_hist *bh = &block_report->hist; + struct hist_entry *he; + + init_block_hist(bh, block_report->fmts); + + while (next) { + he = rb_entry(next, struct hist_entry, rb_node); + block_info__process_sym(he, bh, &block_report->cycles, + total_cycles); + next = rb_next(&he->rb_node); + } + + for (int i = 0; i < PERF_HPP_REPORT__BLOCK_MAX_INDEX; i++) { + block_report->fmts[i].total_cycles = total_cycles; + block_report->fmts[i].block_cycles = block_report->cycles; + } + + hists__output_resort(&bh->block_hists, NULL); +} + +struct block_report *block_info__create_report(struct evlist *evlist, + u64 total_cycles) +{ + struct block_report *block_reports; + int nr_hists = evlist->core.nr_entries, i = 0; + struct evsel *pos; + + block_reports = calloc(nr_hists, sizeof(struct block_report)); + if (!block_reports) + return NULL; + + evlist__for_each_entry(evlist, pos) { + struct hists *hists = evsel__hists(pos); + + process_block_report(hists, &block_reports[i], total_cycles); + i++; + } + + return block_reports; +} diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h index d55dfc2fda6f..b5266588d476 100644 --- a/tools/perf/util/block-info.h +++ b/tools/perf/util/block-info.h @@ -4,8 +4,9 @@ #include <linux/types.h> #include <linux/refcount.h> -#include "util/hist.h" -#include "util/symbol.h" +#include "hist.h" +#include "symbol.h" +#include "sort.h" struct block_info { struct symbol *sym; @@ -20,6 +21,31 @@ struct block_info { refcount_t refcnt; }; +struct block_fmt { + struct perf_hpp_fmt fmt; + int idx; + int width; + const char *header; + u64 total_cycles; + u64 block_cycles; +}; + +enum { + PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT, + PERF_HPP_REPORT__BLOCK_LBR_CYCLES, + PERF_HPP_REPORT__BLOCK_CYCLES_PCT, + PERF_HPP_REPORT__BLOCK_AVG_CYCLES, + PERF_HPP_REPORT__BLOCK_RANGE, + PERF_HPP_REPORT__BLOCK_DSO, + PERF_HPP_REPORT__BLOCK_MAX_INDEX +}; + +struct block_report { + struct block_hist hist; + u64 cycles; + struct block_fmt fmts[PERF_HPP_REPORT__BLOCK_MAX_INDEX]; +}; + struct block_hist; struct block_info *block_info__new(void); @@ -40,4 +66,7 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused, int block_info__process_sym(struct hist_entry *he, struct block_hist *bh, u64 *block_cycles_aggr, u64 total_cycles); +struct block_report *block_info__create_report(struct evlist *evlist, + u64 total_cycles); + #endif /* __PERF_BLOCK_H */ diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 0e27d6830011..7cf137b0451b 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -758,6 +758,10 @@ struct hist_entry *hists__add_entry_block(struct hists *hists, struct hist_entry entry = { .block_info = block_info, .hists = hists, + .ms = { + .map = al->map, + .sym = al->sym, + }, }, *he = hists__findnew_entry(hists, &entry, al, false); return he; -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 61/63] perf report: Sort by sampled cycles percent per block for stdio 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (58 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 60/63] perf hist: Support block formats with compare/sort/display Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 62/63] perf report: Support --percent-limit for --total-cycles Arnaldo Carvalho de Melo ` (2 subsequent siblings) 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jin Yao, Arnaldo Carvalho de Melo, Alexander Shishkin, Andi Kleen, Jin Yao, Kan Liang, Peter Zijlstra From: Jin Yao <yao.jin@linux.intel.com> It would be useful to support sorting for all blocks by the sampled cycles percent per block. This is useful to concentrate on the globally hottest blocks. This patch implements a new option "--total-cycles" which sorts all blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent: percent = block sampled cycles aggregation / total sampled cycles Note that, this patch only supports "--stdio" mode. For example, # perf record -b ./div # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 2M of event 'cycles' # Event count (approx.): 2753248 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ................................................ ................. # 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so 0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so 0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms] 0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms] 0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms] 0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms] 0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms] 0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms] 0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms] 0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms] 0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms] 0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms] 0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms] Committer testing: This should provide material for hours of endless joy, both from looking for suspicious things in the implementation of this patch, such as the top one: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] As well from things that look legit: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux] :-) Very short system wide taken branches session: # perf record -h -b Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -b, --branch-any sample any taken branches # # perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ] # # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ...................................................................... .................... # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux] 0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux] 0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux] 0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux] 0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux] 0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux] 0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux] 0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux] 0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux] 0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux] 0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux] 0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux] 0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux] 0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux] 0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux] 0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux] 0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux] 0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux] 0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux] 0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux] 0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux] 0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux] ----------------------------------------------------------- v7: --- Use use_browser in report__browse_block_hists for supporting stdio and potential tui mode. v6: --- Create report__browse_block_hists in block-info.c (codes are moved from builtin-report.c). It's called from perf_evlist__tty_browse_hists. v5: --- 1. Move all block functions to block-info.c 2. Move the code of setting ms in block hist_entry to other patch. v4: --- 1. Use new option '--total-cycles' to replace '-s total_cycles' in v3. 2. Move block info collection out of block info printing. v3: --- 1. Use common function block_info__process_sym to process the blocks per symbol. 2. Remove the nasty hack for skipping calculation of column length 3. Some minor cleanup Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/Documentation/perf-report.txt | 11 ++++++ tools/perf/builtin-report.c | 44 ++++++++++++++++++++++-- tools/perf/ui/stdio/hist.c | 22 ++++++++++++ tools/perf/util/block-info.c | 17 +++++++++ tools/perf/util/block-info.h | 4 +++ tools/perf/util/symbol_conf.h | 1 + 6 files changed, 96 insertions(+), 3 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 7315f155803f..8dbe2119686a 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -525,6 +525,17 @@ include::itrace.txt[] Configure time quantum for time sort key. Default 100ms. Accepts s, us, ms, ns units. +--total-cycles:: + When --total-cycles is specified, it supports sorting for all blocks by + 'Sampled Cycles%'. This is useful to concentrate on the globally hottest + blocks. In output, there are some new columns: + + 'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles + 'Sampled Cycles' - block sampled cycles aggregation + 'Avg Cycles%' - block average sampled cycles / sum of total block average + sampled cycles + 'Avg Cycles' - block average sampled cycles + include::callchain-overhead-calculation.txt[] SEE ALSO diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index bc15b9dcccd6..992b18bdd723 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -51,6 +51,7 @@ #include "util/util.h" // perf_tip() #include "ui/ui.h" #include "ui/progress.h" +#include "util/block-info.h" #include <dlfcn.h> #include <errno.h> @@ -96,10 +97,13 @@ struct report { float min_percent; u64 nr_entries; u64 queue_size; + u64 total_cycles; int socket_filter; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); struct branch_type_stat brtype_stat; bool symbol_ipc; + bool total_cycles_mode; + struct block_report *block_reports; }; static int report__config(const char *var, const char *value, void *cb) @@ -290,9 +294,10 @@ static int process_sample_event(struct perf_tool *tool, if (al.map != NULL) al.map->dso->hit = 1; - if (ui__has_annotation() || rep->symbol_ipc) { + if (ui__has_annotation() || rep->symbol_ipc || rep->total_cycles_mode) { hist__account_cycles(sample->branch_stack, &al, sample, - rep->nonany_branch_mode, NULL); + rep->nonany_branch_mode, + &rep->total_cycles); } ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep); @@ -485,6 +490,7 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist, const char *help) { struct evsel *pos; + int i = 0; if (!quiet) { fprintf(stdout, "#\n# Total Lost Samples: %" PRIu64 "\n#\n", @@ -500,6 +506,13 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist, continue; hists__fprintf_nr_sample_events(hists, rep, evname, stdout); + + if (rep->total_cycles_mode) { + report__browse_block_hists(&rep->block_reports[i++].hist, + 0, pos); + continue; + } + hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout, !(symbol_conf.use_callchain || symbol_conf.show_branchflag_count)); @@ -925,6 +938,13 @@ static int __cmd_report(struct report *rep) report__output_resort(rep); + if (rep->total_cycles_mode) { + rep->block_reports = block_info__create_report(session->evlist, + rep->total_cycles); + if (!rep->block_reports) + return -1; + } + return report__browse_hists(rep); } @@ -1209,6 +1229,8 @@ int cmd_report(int argc, const char **argv) "Set time quantum for time sort key (default 100ms)", parse_time_quantum), OPTS_EVSWITCH(&report.evswitch), + OPT_BOOLEAN(0, "total-cycles", &report.total_cycles_mode, + "Sort all blocks by 'Sampled Cycles%'"), OPT_END() }; struct perf_data data = { @@ -1371,6 +1393,17 @@ int cmd_report(int argc, const char **argv) goto error; } + if (report.total_cycles_mode) { + if (sort__mode != SORT_MODE__BRANCH) + report.total_cycles_mode = false; + else if (!report.use_stdio) { + pr_err("Error: --total-cycles can be only used together with --stdio\n"); + goto error; + } else { + sort_order = "sym"; + } + } + if (strcmp(input_name, "-") != 0) setup_browser(true); else @@ -1421,7 +1454,8 @@ int cmd_report(int argc, const char **argv) * so don't allocate extra space that won't be used in the stdio * implementation. */ - if (ui__has_annotation() || report.symbol_ipc) { + if (ui__has_annotation() || report.symbol_ipc || + report.total_cycles_mode) { ret = symbol__annotation_init(); if (ret < 0) goto error; @@ -1482,6 +1516,10 @@ int cmd_report(int argc, const char **argv) itrace_synth_opts__clear_time_range(&itrace_synth_opts); zfree(&report.ptime_range); } + + if (report.block_reports) + zfree(&report.block_reports); + zstd_fini(&(session->zstd_data)); perf_session__delete(session); return ret; diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c index 5365606e9dad..655ef7708cd0 100644 --- a/tools/perf/ui/stdio/hist.c +++ b/tools/perf/ui/stdio/hist.c @@ -558,6 +558,25 @@ static int hist_entry__block_fprintf(struct hist_entry *he, return ret; } +static int hist_entry__individual_block_fprintf(struct hist_entry *he, + char *bf, size_t size, + FILE *fp) +{ + int ret = 0; + + struct perf_hpp hpp = { + .buf = bf, + .size = size, + .skip = false, + }; + + hist_entry__snprintf(he, &hpp); + if (!hpp.skip) + ret += fprintf(fp, "%s\n", bf); + + return ret; +} + static int hist_entry__fprintf(struct hist_entry *he, size_t size, char *bf, size_t bfsz, FILE *fp, bool ignore_callchains) @@ -580,6 +599,9 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size, if (symbol_conf.report_block) return hist_entry__block_fprintf(he, bf, size, fp); + if (symbol_conf.report_individual_block) + return hist_entry__individual_block_fprintf(he, bf, size, fp); + hist_entry__snprintf(he, &hpp); ret = fprintf(fp, "%s\n", bf); diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c index 4a7bac95231e..ba891751a6ed 100644 --- a/tools/perf/util/block-info.c +++ b/tools/perf/util/block-info.c @@ -437,3 +437,20 @@ struct block_report *block_info__create_report(struct evlist *evlist, return block_reports; } + +int report__browse_block_hists(struct block_hist *bh, float min_percent, + struct evsel *evsel __maybe_unused) +{ + switch (use_browser) { + case 0: + symbol_conf.report_individual_block = true; + hists__fprintf(&bh->block_hists, true, 0, 0, min_percent, + stdout, true); + hists__delete_entries(&bh->block_hists); + return 0; + default: + return -1; + } + + return 0; +} diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h index b5266588d476..8309297a6e8f 100644 --- a/tools/perf/util/block-info.h +++ b/tools/perf/util/block-info.h @@ -7,6 +7,7 @@ #include "hist.h" #include "symbol.h" #include "sort.h" +#include "ui/ui.h" struct block_info { struct symbol *sym; @@ -69,4 +70,7 @@ int block_info__process_sym(struct hist_entry *he, struct block_hist *bh, struct block_report *block_info__create_report(struct evlist *evlist, u64 total_cycles); +int report__browse_block_hists(struct block_hist *bh, float min_percent, + struct evsel *evsel); + #endif /* __PERF_BLOCK_H */ diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h index e6880789864c..10f1ec3e0349 100644 --- a/tools/perf/util/symbol_conf.h +++ b/tools/perf/util/symbol_conf.h @@ -40,6 +40,7 @@ struct symbol_conf { raw_trace, report_hierarchy, report_block, + report_individual_block, inline_name, disable_add2line_warn; const char *vmlinux_name, -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 62/63] perf report: Support --percent-limit for --total-cycles 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (59 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 61/63] perf report: Sort by sampled cycles percent per block for stdio Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 63/63] perf report: Sort by sampled cycles percent per block for tui Arnaldo Carvalho de Melo 2019-11-12 11:08 ` [GIT PULL] perf/core improvements and fixes Ingo Molnar 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jin Yao, Arnaldo Carvalho de Melo, Alexander Shishkin, Andi Kleen, Jin Yao, Kan Liang, Peter Zijlstra From: Jin Yao <yao.jin@linux.intel.com> We have already supported the '--total-cycles' option in previous patch. It's also useful to show entries only above a threshold percent. This patch enables '--percent-limit' for not showing entries under that percent. For example: perf report --total-cycles --stdio --percent-limit 1 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2M of event 'cycles' # Event count (approx.): 2753248 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ................................................................. .................... # 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so Committer testing: From second exapmple onwards slightly edited for brevity: # perf report --total-cycles --percent-limit 2 --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ...................................................................... .................... # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] # # (Tip: Create an archive with symtabs to analyse on other machine: perf archive) # # perf report --total-cycles --percent-limit 1 --stdio # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so # # perf report --total-cycles --percent-limit 0.7 --stdio # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] # ------------------------------------------- It only shows the entries which 'Sampled Cycles%' > 1%. v7: --- No functional change. Only fix the conflict issue because previous patches are changed. v6: --- No functional change. Only fix the conflict issue because previous patches are changed. v5: --- No functional change. Only fix the conflict issue because previous patches are changed. v4: --- No functional change. Only fix the build issue because previous patches are changed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-report.c | 2 +- tools/perf/ui/stdio/hist.c | 7 ++++++- tools/perf/util/block-info.c | 10 ++++++++++ tools/perf/util/block-info.h | 2 ++ 4 files changed, 19 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 992b18bdd723..ca41187525ed 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -509,7 +509,7 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist, if (rep->total_cycles_mode) { report__browse_block_hists(&rep->block_reports[i++].hist, - 0, pos); + rep->min_percent, pos); continue; } diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c index 655ef7708cd0..132056c7d5b7 100644 --- a/tools/perf/ui/stdio/hist.c +++ b/tools/perf/ui/stdio/hist.c @@ -15,6 +15,7 @@ #include "../../util/srcline.h" #include "../../util/string2.h" #include "../../util/thread.h" +#include "../../util/block-info.h" #include <linux/ctype.h> #include <linux/zalloc.h> @@ -856,7 +857,11 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows, if (h->filtered) continue; - percent = hist_entry__get_percent_limit(h); + if (symbol_conf.report_individual_block) + percent = block_info__total_cycles_percent(h); + else + percent = hist_entry__get_percent_limit(h); + if (percent < min_pcnt) continue; diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c index ba891751a6ed..597d1205fa6c 100644 --- a/tools/perf/util/block-info.c +++ b/tools/perf/util/block-info.c @@ -454,3 +454,13 @@ int report__browse_block_hists(struct block_hist *bh, float min_percent, return 0; } + +float block_info__total_cycles_percent(struct hist_entry *he) +{ + struct block_info *bi = he->block_info; + + if (bi->total_cycles) + return bi->cycles * 100.0 / bi->total_cycles; + + return 0.0; +} diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h index 8309297a6e8f..e4d20bccd9b6 100644 --- a/tools/perf/util/block-info.h +++ b/tools/perf/util/block-info.h @@ -73,4 +73,6 @@ struct block_report *block_info__create_report(struct evlist *evlist, int report__browse_block_hists(struct block_hist *bh, float min_percent, struct evsel *evsel); +float block_info__total_cycles_percent(struct hist_entry *he); + #endif /* __PERF_BLOCK_H */ -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [PATCH 63/63] perf report: Sort by sampled cycles percent per block for tui 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (60 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 62/63] perf report: Support --percent-limit for --total-cycles Arnaldo Carvalho de Melo @ 2019-11-07 19:00 ` Arnaldo Carvalho de Melo 2019-11-12 11:08 ` [GIT PULL] perf/core improvements and fixes Ingo Molnar 62 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-07 19:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jin Yao, Arnaldo Carvalho de Melo, Alexander Shishkin, Andi Kleen, Jin Yao, Kan Liang, Peter Zijlstra From: Jin Yao <yao.jin@linux.intel.com> Previous patch has implemented a new option "--total-cycles". But only stdio mode is supported. This patch supports the tui mode and support '--percent-limit'. For example, perf record -b ./div perf report --total-cycles --percent-limit 1 # Samples: 2753248 of event 'cycles' Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so -------------------------------------------------- v7: --- 1. Since we have used use_browser in report__browse_block_hists to support stdio mode, now we also add supporting for tui. 2. Move block tui browser code from ui/browsers/hists.c to block-info.c. v6: --- Create report__tui_browse_block_hists in block-info.c (codes are moved from builtin-report.c). v5: --- Fix a crash issue when running perf report without '--total-cycles'. The issue is because the internal flag is renamed from 'total_cycles' to 'total_cycles_mode' in previous patch but this patch still uses 'total_cycles' to check if the '--total-cycles' option is enabled, which causes the code to be inconsistent. v4: --- Since the block collection is moved out of printing in previous patch, this patch is updated accordingly for tui supporting. v3: --- Minor change since the function name is changed: block_total_cycles_percent -> block_info__total_cycles_percent Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-report.c | 27 ++++++++++--- tools/perf/ui/browsers/hists.c | 7 +++- tools/perf/ui/browsers/hists.h | 2 + tools/perf/util/block-info.c | 74 +++++++++++++++++++++++++++++++++- 4 files changed, 103 insertions(+), 7 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index ca41187525ed..1e81985b7d56 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -485,6 +485,22 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report return ret + fprintf(fp, "\n#\n"); } +static int perf_evlist__tui_block_hists_browse(struct evlist *evlist, + struct report *rep) +{ + struct evsel *pos; + int i = 0, ret; + + evlist__for_each_entry(evlist, pos) { + ret = report__browse_block_hists(&rep->block_reports[i++].hist, + rep->min_percent, pos); + if (ret != 0) + return ret; + } + + return 0; +} + static int perf_evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, const char *help) @@ -595,6 +611,11 @@ static int report__browse_hists(struct report *rep) switch (use_browser) { case 1: + if (rep->total_cycles_mode) { + ret = perf_evlist__tui_block_hists_browse(evlist, rep); + break; + } + ret = perf_evlist__tui_browse_hists(evlist, help, NULL, rep->min_percent, &session->header.env, @@ -1396,12 +1417,8 @@ int cmd_report(int argc, const char **argv) if (report.total_cycles_mode) { if (sort__mode != SORT_MODE__BRANCH) report.total_cycles_mode = false; - else if (!report.use_stdio) { - pr_err("Error: --total-cycles can be only used together with --stdio\n"); - goto error; - } else { + else sort_order = "sym"; - } } if (strcmp(input_name, "-") != 0) diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c index 7a7187e069b4..334afc2139e7 100644 --- a/tools/perf/ui/browsers/hists.c +++ b/tools/perf/ui/browsers/hists.c @@ -26,6 +26,7 @@ #include "../../util/sort.h" #include "../../util/top.h" #include "../../util/thread.h" +#include "../../util/block-info.h" #include "../../arch/common.h" #include "../../perf.h" @@ -1783,7 +1784,11 @@ static unsigned int hist_browser__refresh(struct ui_browser *browser) continue; } - percent = hist_entry__get_percent_limit(h); + if (symbol_conf.report_individual_block) + percent = block_info__total_cycles_percent(h); + else + percent = hist_entry__get_percent_limit(h); + if (percent < hb->min_pcnt) continue; diff --git a/tools/perf/ui/browsers/hists.h b/tools/perf/ui/browsers/hists.h index 91d3e18b50aa..078f2f2c7abd 100644 --- a/tools/perf/ui/browsers/hists.h +++ b/tools/perf/ui/browsers/hists.h @@ -5,6 +5,7 @@ #include "ui/browser.h" struct annotation_options; +struct evsel; struct hist_browser { struct ui_browser b; @@ -15,6 +16,7 @@ struct hist_browser { struct pstack *pstack; struct perf_env *env; struct annotation_options *annotation_opts; + struct evsel *block_evsel; int print_seq; bool show_dso; bool show_headers; diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c index 597d1205fa6c..9abc201ebe63 100644 --- a/tools/perf/util/block-info.c +++ b/tools/perf/util/block-info.c @@ -10,6 +10,7 @@ #include "map.h" #include "srcline.h" #include "evlist.h" +#include "ui/browsers/hists.h" static struct block_header_column { const char *name; @@ -438,9 +439,75 @@ struct block_report *block_info__create_report(struct evlist *evlist, return block_reports; } +#ifdef HAVE_SLANG_SUPPORT +static int block_hists_browser__title(struct hist_browser *browser, char *bf, + size_t size) +{ + struct hists *hists = evsel__hists(browser->block_evsel); + const char *evname = perf_evsel__name(browser->block_evsel); + unsigned long nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE]; + int ret; + + ret = scnprintf(bf, size, "# Samples: %lu", nr_samples); + if (evname) + scnprintf(bf + ret, size - ret, " of event '%s'", evname); + + return 0; +} + +static int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel, + float min_percent) +{ + struct hists *hists = &bh->block_hists; + struct hist_browser *browser; + int key = -1; + static const char help[] = + " q Quit \n"; + + browser = hist_browser__new(hists); + if (!browser) + return -1; + + browser->block_evsel = evsel; + browser->title = block_hists_browser__title; + browser->min_pcnt = min_percent; + + /* reset abort key so that it can get Ctrl-C as a key */ + SLang_reset_tty(); + SLang_init_tty(0, 0, 0); + + while (1) { + key = hist_browser__run(browser, "? - help", true); + + switch (key) { + case 'q': + goto out; + case '?': + ui_browser__help_window(&browser->b, help); + break; + default: + break; + } + } + +out: + hist_browser__delete(browser); + return 0; +} +#else +static int block_hists_tui_browse(struct block_hist *bh __maybe_unused, + struct evsel *evsel __maybe_unused, + float min_percent __maybe_unused) +{ + return 0; +} +#endif + int report__browse_block_hists(struct block_hist *bh, float min_percent, - struct evsel *evsel __maybe_unused) + struct evsel *evsel) { + int ret; + switch (use_browser) { case 0: symbol_conf.report_individual_block = true; @@ -448,6 +515,11 @@ int report__browse_block_hists(struct block_hist *bh, float min_percent, stdout, true); hists__delete_entries(&bh->block_hists); return 0; + case 1: + symbol_conf.report_individual_block = true; + ret = block_hists_tui_browse(bh, evsel, min_percent); + hists__delete_entries(&bh->block_hists); + return ret; default: return -1; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo ` (61 preceding siblings ...) 2019-11-07 19:00 ` [PATCH 63/63] perf report: Sort by sampled cycles percent per block for tui Arnaldo Carvalho de Melo @ 2019-11-12 11:08 ` Ingo Molnar 62 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-11-12 11:08 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Andi Kleen, Haiyan Song, Ian Rogers, Igor Lubashev, James Clark, Jin Yao, Jiwei Sun, John Garry, Leo Yan, Masami Hiramatsu, Will Deacon, Yunfeng Ye, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit d44f821b0e13275735e8f3fe4db8703b45f05d52: > > perf/core: Optimize perf_init_event() for TYPE_SOFTWARE (2019-10-28 12:53:28 +0100) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191107 > > for you to fetch changes up to 7fa46cbf20d327d78114b1c8c7e69fabe7c57794: > > perf report: Sort by sampled cycles percent per block for tui (2019-11-07 10:14:48 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > 87 files changed, 22145 insertions(+), 19453 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2020-05-06 15:21 Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-05-06 15:21 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Daniel Díaz, He Zhe, Hulk Robot, Ian Rogers, Jagadeesh Pagadala, Jin Yao, Kajol Jain, Konstantin Khlebnikov, Leo Yan, Mike Leach, Shaokun Zhang, Stephane Eranian, Thomas Backlund Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 87cfeb1920f84f465a738d4c6589033eefa20b45: Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2020-04-22 14:08:28 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.8-20200506 for you to fetch changes up to 19ce2321739da5fc27f6a5ed1e1cb15e384ad030: perf flamegraph: Use /bin/bash for report and record scripts (2020-05-05 16:35:32 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf record: - Introduce --switch-output-event to use arbitrary events to be setup and read from a side band thread and, when they take place a signal be sent to the main 'perf record' thread, reusing the --switch-output code to take perf.data snapshots from the --overwrite ring buffer, e.g.: # perf record --overwrite -e sched:* \ --switch-output-event syscalls:*connect* \ workload will take perf.data.YYYYMMDDHHMMSS snapshots up to around the connect syscalls. Stephane Eranian: - Add --num-synthesize-threads option to control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. This mimics pre-existing behaviour in 'perf top'. Intel PT: Adrian Hunter: - Add support for synthesizing branch stacks for regular events (cycles, instructions, etc) from Intel PT data. perf bench: Ian Rogers: - Add a multi-threaded synthesize benchmark. - Add kallsyms parsing benchmark. Tommi Rantala: - Fix div-by-zero if runtime is zero. perf synthetic events: - Remove use of sscanf from /proc reading when parsing pre-existing threads to generate synthetic PERF_RECORD_{FORK,MMAP,COMM,etc} events. tools api: - Add a lightweight buffered reading API. libsymbols: - Parse kallsyms using new lightweight buffered reading io API. perf parse-events: - Fix memory leaks found on parse_events. perf mem2node: - Avoid double free related to realloc(). perf stat: Jin Yao: - Zero all the 'ena' and 'run' array slot stats for interval mode. - Improve runtime stat for interval mode Kajol Jain: - Enable Hz/hz printing for --metric-only option - Enhance JSON/metric infrastructure to handle "?". perf tests: Kajol Jain: - Added test for runtime param in metric expression. Tommi Rantala: - Fix data path in the session topology test. perf vendor events power9: Kajol Jain: - Add hv_24x7 socket/chip level metric events Coresight: Leo Yan: - Move definition of 'traceid_list' global variable from header file. Mike Leach: - Update to build with latest opencsd version. perf pmu: Shaokun Zhang: - Fix function name in comment, its get_cpuid_str(), not get_cpustr() Stephane Eranian: - Add perf_pmu__find_by_type() helper perf script: Stephane Eranian: - Remove extraneous newline in perf_sample__fprintf_regs(). Ian Rogers: - Avoid NULL dereference on symbol. tools feature: Stephane Eranian: - Add support for detecting libpfm4. perf symbol: Thomas Richter: - Fix kernel symbol address display in TUI verbose mode. perf cgroup: Tommi Rantala: - Avoid needless closing of unopened fd libperf: He Zhe: - Add NULL pointer check for cpu_map iteration and NULL assignment for all_cpus. Ian Rogers: - Fix a refcount leak in evlist method. Arnaldo Carvalho de Melo: - Rename the code in tools/perf/util, i.e. perf tooling specific, that operates on 'struct evsel' to evsel__, leaving the perf_evsel__ namespace for the routines in tools/lib/perf/ that operate on 'struct perf_evsel__'. tools/perf specific libraries: Konstantin Khlebnikov: - Fix reading new topology attribute "core_cpus" - Simplify checking if SMT is active. perf flamegraph: Arnaldo Carvalho de Melo: - Use /bin/bash for report and record scripts, just like all other such scripts, fixing a package dependency bug in a Linaro OpenEmbedded build checker. perf evlist: Jagadeesh Pagadala: - Remove duplicate headers. Miscelaneous: Zou Wei: - Remove unneeded semicolon in libtraceevent, 'perf c2c' and others. - Fix warning assignment of 0/1 to bool variable in 'perf report' Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (9): perf thread-stack: Add branch stack support perf intel-pt: Consolidate thread-stack use condition perf intel-pt: Change branch stack support to use thread-stacks perf auxtrace: Add option to synthesize branch stack for regular events perf evsel: Add support for synthesized branch stack sample type perf thread-stack: Add thread_stack__br_sample_late() perf intel-pt: Add support for synthesizing branch stacks for regular events perf intel-pt: Update documentation about itrace G and L options perf intel-pt: Update documentation about using /proc/kcore Arnaldo Carvalho de Melo (44): perf tools: Move routines that probe for perf API features to separate file perf record: Move sb_evlist to 'struct record' perf top: Move sb_evlist to 'struct perf_top' perf bpf: Decouple creating the evlist from adding the SB event perf parse-events: Add parse_events_option() variant that creates evlist perf evlist: Move the sideband thread routines to separate object perf evlist: Allow reusing the side band thread for more purposes libsubcmd: Introduce OPT_CALLBACK_SET() perf record: Introduce --switch-output-event perf record: Move side band evlist setup to separate routine perf evsel: Rename 'struct perf_evsel__sb_cb_t' to 'struct evsel__sb_cb_t' perf evsel: Rename perf_evsel__nr_cpus() to evsel__nr_cpus() perf evsel: Rename perf_evsel__compute_deltas() to evsel__compute_deltas() perf evsel: Rename perf_evsel__find_pmu() to evsel__find_pmu() perf evsel: Rename perf_evsel__is_aux_event() to evsel__is_aux_event() perf evsel: Rename perf_evsel__exit() to evsel__exit() perf evsel: Rename perf_evsel__config*() to evsel__config*() perf evsel: Rename perf_evsel__calc_id_pos() to evsel__calc_id_pos() perf evsel: Rename __perf_evsel__sample_size() to __evsel__sample_size() perf evsel: Rename *perf_evsel__*name() to *evsel__*name() perf evsel: Rename perf_evsel__group_desc() to evsel__group_desc() perf evsel: Rename *perf_evsel__*set_sample_*() to *evsel__*set_sample_*() perf evsel: Rename perf_evsel__*filter*() to evsel__*filter*() perf evsel: Rename perf_evsel__open_per_*() to evsel__open_per_*() perf evsel: Rename perf_evsel__{str,int}val() and other tracepoint field metehods to to evsel__*() perf evsel: Rename perf_evsel__is_*() to evsel__is*() perf evsel: Ditch perf_evsel__cmp(), not used for quite a while perf evsel: Rename *perf_evsel__read*() to *evsel__read() perf evsel: Rename perf_evsel__parse_sample*() to evsel__parse_sample*() perf evsel: Rename perf_evsel__{prev,next}() to evsel__{prev,next}() perf evsel: Rename perf_evsel__has*() to evsel__has*() perf evsel: Rename perf_evsel__fallback() to evsel__fallback() perf evsel: Rename perf_evsel__group_idx() to evsel__group_idx() perf evsel: Rename perf_evsel__env() to evsel__env() perf evsel: Rename perf_evsel__store_ids() to evsel__store_id() perf stat: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf kmem: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf lock: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf sched: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf flamegraph: Use /bin/bash for report and record scripts He Zhe (1): libperf: Add NULL pointer check for cpu_map iteration and NULL assignment for all_cpus. Ian Rogers (13): perf script: Avoid NULL dereference on symbol perf bench: Add a multi-threaded synthesize benchmark tools api: Add a lightweight buffered reading api perf synthetic events: Remove use of sscanf from /proc reading perf parse-events: Fix memory leaks found on parse_events perf parse-events: Fix memory leaks found on parse_events perf parse-events: Fix another memory leaks found on parse_events() libperf evlist: Fix a refcount leak perf mem2node: Avoid double free related to realloc perf doc: Pass ASCIIDOC_EXTRA as an argument perf bench: Add kallsyms parsing libsymbols kallsyms: Parse using io api libsymbols kallsyms: Move hex2u64 out of header Jagadeesh Pagadala (1): perf evlist: Remove duplicate headers Jin Yao (2): perf stat: Zero all the 'ena' and 'run' array slot stats for interval mode perf stat: Improve runtime stat for interval mode Kajol Jain (4): perf metricgroups: Enhance JSON/metric infrastructure to handle "?" perf tests expr: Added test for runtime param in metric expression perf tools: Enable Hz/hz prinitg for --metric-only option perf vendor events power9: Add hv_24x7 socket/chip level metric events Konstantin Khlebnikov (2): perf tools: Fix reading new topology attribute "core_cpus" perf tools: Simplify checking if SMT is active. Leo Yan (1): perf cs-etm: Move definition of 'traceid_list' global variable from header file Mike Leach (1): perf: cs-etm: Update to build with latest opencsd version. Shaokun Zhang (1): perf pmu: Fix function name in comment, its get_cpuid_str(), not get_cpustr() Stephane Eranian (4): perf record: Add num-synthesize-threads option perf script: Remove extraneous newline in perf_sample__fprintf_regs() tools feature: Add support for detecting libpfm4 perf pmu: Add perf_pmu__find_by_type helper Thomas Richter (1): perf symbol: Fix kernel symbol address display Tommi Rantala (3): perf cgroup: Avoid needless closing of unopened fd perf bench: Fix div-by-zero if runtime is zero perf test session topology: Fix data path Zou Wei (4): libtraceevent: Remove unneeded semicolon perf c2c: Remove unneeded semicolon perf tools: Remove unneeded semicolons perf report: Fix warning assignment of 0/1 to bool variable tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 6 +- tools/build/feature/test-libopencsd.c | 4 +- tools/build/feature/test-libpfm4.c | 9 + tools/lib/api/io.h | 115 ++++++++ tools/lib/perf/cpumap.c | 2 +- tools/lib/perf/evlist.c | 4 +- tools/lib/subcmd/parse-options.h | 2 + tools/lib/symbol/kallsyms.c | 86 +++--- tools/lib/symbol/kallsyms.h | 2 - tools/lib/traceevent/kbuffer-parse.c | 2 +- tools/perf/Documentation/itrace.txt | 5 + tools/perf/Documentation/perf-intel-pt.txt | 53 +++- tools/perf/Documentation/perf-record.txt | 17 ++ tools/perf/Documentation/perf-stat.txt | 2 + tools/perf/Makefile.perf | 6 +- tools/perf/arch/arm/util/cs-etm.c | 7 +- tools/perf/arch/arm64/util/arm-spe.c | 12 +- tools/perf/arch/powerpc/util/header.c | 8 + tools/perf/arch/powerpc/util/kvm-stat.c | 2 +- tools/perf/arch/s390/util/kvm-stat.c | 8 +- tools/perf/arch/x86/tests/perf-time-to-tsc.c | 6 +- tools/perf/arch/x86/util/intel-bts.c | 2 +- tools/perf/arch/x86/util/intel-pt.c | 21 +- tools/perf/arch/x86/util/kvm-stat.c | 12 +- tools/perf/bench/Build | 1 + tools/perf/bench/bench.h | 1 + tools/perf/bench/epoll-wait.c | 3 +- tools/perf/bench/futex-hash.c | 3 +- tools/perf/bench/futex-lock-pi.c | 3 +- tools/perf/bench/kallsyms-parse.c | 75 +++++ tools/perf/bench/synthesize.c | 211 ++++++++++++-- tools/perf/builtin-annotate.c | 15 +- tools/perf/builtin-bench.c | 1 + tools/perf/builtin-c2c.c | 9 +- tools/perf/builtin-diff.c | 8 +- tools/perf/builtin-inject.c | 19 +- tools/perf/builtin-kmem.c | 65 ++--- tools/perf/builtin-kvm.c | 23 +- tools/perf/builtin-lock.c | 42 ++- tools/perf/builtin-mem.c | 2 +- tools/perf/builtin-record.c | 117 ++++++-- tools/perf/builtin-report.c | 21 +- tools/perf/builtin-sched.c | 78 +++--- tools/perf/builtin-script.c | 73 ++--- tools/perf/builtin-stat.c | 31 +-- tools/perf/builtin-timechart.c | 52 ++-- tools/perf/builtin-top.c | 36 ++- tools/perf/builtin-trace.c | 115 ++++---- .../arch/powerpc/power9/nest_metrics.json | 19 ++ tools/perf/pmu-events/pmu-events.h | 2 +- tools/perf/scripts/python/bin/flamegraph-record | 2 +- tools/perf/scripts/python/bin/flamegraph-report | 2 +- tools/perf/tests/Build | 1 + tools/perf/tests/api-io.c | 304 ++++++++++++++++++++ tools/perf/tests/builtin-test.c | 4 + tools/perf/tests/event-times.c | 8 +- tools/perf/tests/event_update.c | 2 +- tools/perf/tests/evsel-roundtrip-name.c | 20 +- tools/perf/tests/evsel-tp-sched.c | 2 +- tools/perf/tests/expr.c | 16 +- tools/perf/tests/hists_cumulate.c | 8 +- tools/perf/tests/mmap-basic.c | 4 +- tools/perf/tests/openat-syscall-all-cpus.c | 6 +- tools/perf/tests/openat-syscall-tp-fields.c | 6 +- tools/perf/tests/openat-syscall.c | 8 +- tools/perf/tests/parse-events.c | 138 ++++----- tools/perf/tests/perf-record.c | 6 +- tools/perf/tests/sample-parsing.c | 6 +- tools/perf/tests/switch-tracking.c | 14 +- tools/perf/tests/tests.h | 1 + tools/perf/tests/topology.c | 12 +- tools/perf/ui/browsers/hists.c | 18 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/ui/gtk/hists.c | 6 +- tools/perf/ui/hist.c | 16 +- tools/perf/util/Build | 2 + tools/perf/util/annotate.c | 20 +- tools/perf/util/auxtrace.c | 33 ++- tools/perf/util/auxtrace.h | 2 + tools/perf/util/bpf-event.c | 3 +- tools/perf/util/bpf-event.h | 7 +- tools/perf/util/bpf-loader.c | 2 +- tools/perf/util/cgroup.c | 3 +- tools/perf/util/cloexec.c | 2 +- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 2 + tools/perf/util/cs-etm.c | 3 + tools/perf/util/cs-etm.h | 3 - tools/perf/util/data-convert-bt.c | 6 +- tools/perf/util/event.c | 2 +- tools/perf/util/evlist.c | 153 +--------- tools/perf/util/evlist.h | 9 +- tools/perf/util/evsel.c | 308 ++++++++++----------- tools/perf/util/evsel.h | 180 ++++++------ tools/perf/util/evsel_config.h | 2 +- tools/perf/util/evsel_fprintf.c | 8 +- tools/perf/util/expr.c | 11 +- tools/perf/util/expr.h | 5 +- tools/perf/util/expr.l | 27 +- tools/perf/util/header.c | 13 +- tools/perf/util/hist.c | 8 +- tools/perf/util/intel-bts.c | 6 +- .../util/intel-pt-decoder/intel-pt-pkt-decoder.c | 2 +- tools/perf/util/intel-pt.c | 215 +++++++------- tools/perf/util/machine.c | 4 +- tools/perf/util/mem2node.c | 3 +- tools/perf/util/metricgroup.c | 28 +- tools/perf/util/metricgroup.h | 2 + tools/perf/util/ordered-events.c | 2 +- tools/perf/util/parse-events.c | 39 ++- tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.y | 3 +- tools/perf/util/perf_api_probe.c | 164 +++++++++++ tools/perf/util/perf_api_probe.h | 14 + tools/perf/util/pmu.c | 17 +- tools/perf/util/pmu.h | 1 + tools/perf/util/python.c | 4 +- tools/perf/util/record.c | 173 +----------- tools/perf/util/record.h | 1 + tools/perf/util/s390-cpumsf.c | 3 +- .../util/scripting-engines/trace-event-python.c | 6 +- tools/perf/util/session.c | 9 +- tools/perf/util/sideband_evlist.c | 148 ++++++++++ tools/perf/util/smt.c | 10 +- tools/perf/util/sort.c | 10 +- tools/perf/util/stat-display.c | 23 +- tools/perf/util/stat-shadow.c | 53 ++-- tools/perf/util/stat.c | 24 +- tools/perf/util/symbol.c | 14 + tools/perf/util/synthetic-events.c | 159 +++++++---- tools/perf/util/thread-stack.c | 217 ++++++++++++++- tools/perf/util/thread-stack.h | 8 +- tools/perf/util/top.c | 2 +- tools/perf/util/top.h | 2 +- tools/perf/util/trace-event-read.c | 2 +- 135 files changed, 2699 insertions(+), 1517 deletions(-) create mode 100644 tools/build/feature/test-libpfm4.c create mode 100644 tools/lib/api/io.h create mode 100644 tools/perf/bench/kallsyms-parse.c create mode 100644 tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json create mode 100644 tools/perf/tests/api-io.c create mode 100644 tools/perf/util/perf_api_probe.c create mode 100644 tools/perf/util/perf_api_probe.h create mode 100644 tools/perf/util/sideband_evlist.c Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Ubuntu 19.10 and debian experimental are failing when linking against libllvm, which isn't the default, needs to be investigated, haven't tested with CC=gcc, but should be the same problem: + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang ... /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_ignoringImpCasts0Matcher::matches(clang::Expr const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x43): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_hasLoopVariable0Matcher::matches(clang::CXXForRangeStmt const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x48): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' ... It builds ok with the default set of options. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.7.0-rc2.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:3.11 : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0) 9 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.1 (git://git.alpinelinux.org/aports 7c78441134e54efbb34618f457d88c783c913361) (based on LLVM 9.0.1) 10 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final) 11 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1 12 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.2.1 20190827 (ALT Sisyphus 9.2.1-alt2), clang version 7.0.1 13 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 14 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 15 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 16 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 17 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 18 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 19 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 20 centos:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), clang version 8.0.1 (Red Hat 8.0.1-1.module_el8.1.0+215+a01033fb) 21 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.3.1 20200501 releases/gcc-9.3.0-196-gcb2c76c8b1, clang version 10.0.0 22 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 23 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 24 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 25 debian:experimental : FAIL gcc (Debian 9.3.0-11) 9.3.0, clang version 9.0.1-12 26 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0 27 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 28 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.3.0-8) 9.3.0 29 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 30 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 31 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 32 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 33 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 34 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 35 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 36 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 37 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 38 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 39 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 40 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 41 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 42 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 43 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.1 (Fedora 9.0.1-2.fc31) 44 fedora:32 : Ok gcc (GCC) 10.0.1 20200430 (Red Hat 10.0.1-0.13), clang version 10.0.0 (Fedora 10.0.0-1.fc32) 45 fedora:rawhide : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.3.rc2.fc33) 46 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 47 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 48 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 49 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 50 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 51 openmandriva:cooker : Ok gcc (GCC) 10.0.0 20200216 (OpenMandriva), clang version 10.0.0 52 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 53 opensuse:15.1 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 54 opensuse:15.2 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 55 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 56 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20200128 [revision 83f65674e78d97d27537361de1a9d74067ff228d], clang version 9.0.1 57 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 58 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 59 oraclelinux:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4.5.0.7), clang version 8.0.1 (Red Hat 8.0.1-1.0.1.module+el8.1.0+5428+345cee14) 60 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 61 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 62 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 63 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 66 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 67 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 68 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 69 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 70 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0 71 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0 72 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 73 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 76 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 77 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 78 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 79 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 80 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 81 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 82 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 83 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 84 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 85 ubuntu:19.10 : FAIL gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) 86 ubuntu:20.04 : Ok gcc (Ubuntu 9.3.0-8ubuntu1) 9.3.0, clang version 10.0.0-1ubuntu1 # # uname -a Linux five 5.5.17-200.fc31.x86_64 #1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 19ce2321739d perf flamegraph: Use /bin/bash for report and record scripts # perf version --build-options perf version 5.7.rc2.g19ce2321739d dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: PMU events : Ok 11: DSO data read : Ok 12: DSO data cache : Ok 13: DSO data reopen : Ok 14: Roundtrip evsel->name : Ok 15: Parse sched tracepoints fields : Ok 16: syscalls:sys_enter_openat event fields : Ok 17: Setup struct perf_event_attr : Ok 18: Match and link multiple hists : Ok 19: 'import perf' in python : Ok 20: Breakpoint overflow signal handler : Ok 21: Breakpoint overflow sampling : Ok 22: Breakpoint accounting : Ok 23: Watchpoint : 23.1: Read Only Watchpoint : Skip 23.2: Write Only Watchpoint : Ok 23.3: Read / Write Watchpoint : Ok 23.4: Modify Watchpoint : Ok 24: Number of exit events of a simple workload : Ok 25: Software clock events period values : Ok 26: Object code reading : Ok 27: Sample parsing : Ok 28: Use a dummy software event to keep tracking : Ok 29: Parse with no sample_id_all bit set : Ok 30: Filter hist entries : Ok 31: Lookup mmap thread : Ok 32: Share thread maps : Ok 33: Sort output of hist entries : Ok 34: Cumulate child hist entries : Ok 35: Track with sched_switch : Ok 36: Filter fds with revents mask in a fdarray : Ok 37: Add fd to a fdarray, making it autogrow : Ok 38: kmod_path__parse : Ok 39: Thread map : Ok 40: LLVM search and compile : 40.1: Basic BPF llvm compile : Ok 40.2: kbuild searching : Ok 40.3: Compile source for BPF prologue generation : Ok 40.4: Compile source for BPF relocation : Ok 41: Session topology : Ok 42: BPF filter : 42.1: Basic BPF filtering : Ok 42.2: BPF pinning : Ok 42.3: BPF prologue generation : Ok 42.4: BPF relocation checker : Ok 43: Synthesize thread map : Ok 44: Remove thread map : Ok 45: Synthesize cpu map : Ok 46: Synthesize stat config : Ok 47: Synthesize stat : Ok 48: Synthesize stat round : Ok 49: Synthesize attr update : Ok 50: Event times : Ok 51: Read backward ring buffer : Ok 52: Print cpu map : Ok 53: Merge cpu map : Ok 54: Probe SDT events : Ok 55: is_printable_array : Ok 56: Print bitmap : Ok 57: perf hooks : Ok 58: builtin clang support : Skip (not compiled in) 59: unit_number__scnprintf : Ok 60: mem2node : Ok 61: time utils : Ok 62: Test jit_write_elf : Ok 63: Test api io : Ok 64: maps__merge_in : Ok 65: x86 rdpmc : Ok 66: Convert perf time to TSC : Ok 67: DWARF unwind : Ok 68: x86 instruction decoder - new instructions : Ok 69: Intel PT packet decoder : Ok 70: x86 bp modify : Ok 71: probe libc's inet_pton & backtrace it with ping : Ok 72: Use vfs_getname probe to get syscall args filenames : Ok 73: Check open filename arg using perf trace + vfs_getname: Ok 74: Zstd perf.data compression/decompression : Ok 75: Add vfs_getname probe to get syscall args filenames : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_gtk2_O: make NO_GTK2=1 make_clean_all_O: make clean all make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libbpf_O: make NO_LIBBPF=1 make_install_prefix_O: make install prefix=/tmp/krava make_tags_O: make tags make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_perf_o_O: make perf.o make_debug_O: make DEBUG=1 make_doc_O: make doc make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_demangle_O: make NO_DEMANGLE=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_install_O: make install make_install_bin_O: make install-bin make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_pure_O: make make_no_libelf_O: make NO_LIBELF=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_slang_O: make NO_SLANG=1 make_util_map_o_O: make util/map.o make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_newt_O: make NO_NEWT=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_help_O: make help make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_libperl_O: make NO_LIBPERL=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2020-04-20 11:52 Arnaldo Carvalho de Melo 2020-04-22 12:09 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-04-20 11:52 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexey Budankov, Andreas Gerstmayr, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit cd0943357bc7570f081701d005318c20982178b8: Merge tag 'perf-urgent-for-mingo-5.7-20200414' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2020-04-16 10:21:31 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.8-20200420 for you to fetch changes up to 12e89e65f446476951f42aedeef56b6bd6f7f1e6: perf hist: Add fast path for duplicate entries check (2020-04-18 09:05:01 -0300) ---------------------------------------------------------------- perf/core fixes and improvements: kernel + tools/perf: Alexey Budankov: - Introduce CAP_PERFMON to kernel and user space. callchains: Adrian Hunter: - Allow using Intel PT to synthesize callchains for regular events. Kan Liang: - Stitch LBR records from multiple samples to get deeper backtraces, there are caveats, see the csets for details. perf script: Andreas Gerstmayr: - Add flamegraph.py script BPF: Jiri Olsa: - Synthesize bpf_trampoline/dispatcher ksymbol events. perf stat: Arnaldo Carvalho de Melo: - Honour --timeout for forked workloads. Stephane Eranian: - Force error in fallback on :k events, to avoid counting nothing when the user asks for kernel events but is not allowed to. perf bench: Ian Rogers: - Add event synthesis benchmark. tools api fs: Stephane Eranian: - Make xxx__mountpoint() more scalable libtraceevent: He Zhe: - Handle return value of asprintf. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (18): perf script: Simplify auxiliary event printing functions perf auxtrace: Add ->evsel_is_auxtrace() callback perf intel-pt: Implement ->evsel_is_auxtrace() callback perf intel-bts: Implement ->evsel_is_auxtrace() callback perf arm-spe: Implement ->evsel_is_auxtrace() callback perf cs-etm: Implement ->evsel_is_auxtrace() callback perf s390-cpumsf: Implement ->evsel_is_auxtrace() callback perf auxtrace: For reporting purposes, un-group AUX area event perf auxtrace: Add an option to synthesize callchains for regular events perf thread-stack: Add thread_stack__sample_late() perf evsel: Be consistent when looking which evsel PERF_SAMPLE_ bits are set perf evsel: Add support for synthesized sample type perf intel-pt: Add support for synthesizing callchains for regular events perf evsel: Move and globalize perf_evsel__find_pmu() and perf_evsel__is_aux_event() perf evlist: Move leader-sampling configuration perf evsel: Rearrange perf_evsel__config_leader_sampling() perf evlist: Allow multiple read formats perf tools: Add support for leader-sampling with AUX area events Alexey Budankov (12): capabilities: Introduce CAP_PERFMON to kernel and user space perf/core: Open access to the core for CAP_PERFMON privileged process perf/core: open access to probes for CAP_PERFMON privileged process perf tools: Support CAP_PERFMON capability drm/i915/perf: Open access for CAP_PERFMON privileged process trace/bpf_trace: Open access for CAP_PERFMON privileged process powerpc/perf: open access for CAP_PERFMON privileged process parisc/perf: open access for CAP_PERFMON privileged process drivers/perf: Open access for CAP_PERFMON privileged process drivers/oprofile: Open access for CAP_PERFMON privileged process doc/admin-guide: Update perf-security.rst with CAP_PERFMON information doc/admin-guide: update kernel.rst with CAP_PERFMON information Andreas Gerstmayr (1): perf script: Add flamegraph.py script Arnaldo Carvalho de Melo (1): perf stat: Honour --timeout for forked workloads He Zhe (1): tools lib traceevent: Take care of return value of asprintf Ian Rogers (3): perf bench: Add event synthesis benchmark perf synthetic-events: save 4kb from 2 stack frames perf doc: allow ASCIIDOC_EXTRA to be an argument Jiri Olsa (6): perf tools: Synthesize bpf_trampoline/dispatcher ksymbol event perf machine: Set ksymbol dso as loaded on arrival perf annotate: Add basic support for bpf_image perf expr: Add expr_ prefix for parse_ctx and parse_id perf expr: Add expr_scanner_ctx object perf parser: Add support to specify rXXX event with pmu Kajol Jain (1): perf metrictroup: Split the metricgroup__add_metric function Kan Liang (15): perf pmu: Add support for PMU capabilities perf header: Support CPU PMU capabilities perf machine: Remove the indent in resolve_lbr_callchain_sample perf machine: Refine the function for LBR call stack reconstruction perf machine: Factor out lbr_callchain_add_kernel_ip() perf machine: Factor out lbr_callchain_add_lbr_ip() perf thread: Add a knob for LBR stitch approach perf thread: Save previous sample for LBR stitching approach perf callchain: Save previous cursor nodes for LBR stitching approach perf callchain: Stitch LBR call stack perf report: Add option to enable the LBR stitching approach perf script: Add option to enable the LBR stitching approach perf top: Add option to enable the LBR stitching approach perf c2c: Add option to enable the LBR stitching approach perf hist: Add fast path for duplicate entries check Stephane Eranian (2): tools api fs: Make xxx__mountpoint() more scalable perf stat: Force error in fallback on :k events Documentation/admin-guide/perf-security.rst | 86 ++-- Documentation/admin-guide/sysctl/kernel.rst | 16 +- arch/parisc/kernel/perf.c | 2 +- arch/powerpc/perf/imc-pmu.c | 4 +- drivers/gpu/drm/i915/i915_perf.c | 13 +- drivers/oprofile/event_buffer.c | 2 +- drivers/perf/arm_spe_pmu.c | 4 +- include/linux/capability.h | 4 + include/linux/perf_event.h | 6 +- include/uapi/linux/capability.h | 8 +- kernel/events/core.c | 6 +- kernel/trace/bpf_trace.c | 2 +- security/selinux/include/classmap.h | 4 +- tools/lib/api/fs/fs.c | 17 + tools/lib/api/fs/fs.h | 12 + tools/lib/traceevent/parse-filter.c | 29 +- tools/perf/Documentation/Makefile | 4 +- tools/perf/Documentation/itrace.txt | 1 + tools/perf/Documentation/perf-bench.txt | 8 + tools/perf/Documentation/perf-c2c.txt | 11 + tools/perf/Documentation/perf-list.txt | 8 + tools/perf/Documentation/perf-report.txt | 11 + tools/perf/Documentation/perf-script.txt | 11 + tools/perf/Documentation/perf-top.txt | 9 + tools/perf/Documentation/perf.data-file-format.txt | 16 + tools/perf/bench/Build | 2 +- tools/perf/bench/bench.h | 2 +- tools/perf/bench/synthesize.c | 101 +++++ tools/perf/builtin-bench.c | 6 + tools/perf/builtin-c2c.c | 12 + tools/perf/builtin-ftrace.c | 5 +- tools/perf/builtin-report.c | 15 +- tools/perf/builtin-script.c | 318 ++++----------- tools/perf/builtin-stat.c | 5 +- tools/perf/builtin-top.c | 11 + tools/perf/design.txt | 3 +- tools/perf/scripts/python/bin/flamegraph-record | 2 + tools/perf/scripts/python/bin/flamegraph-report | 3 + tools/perf/scripts/python/flamegraph.py | 124 ++++++ tools/perf/tests/expr.c | 4 +- tools/perf/tests/parse-events.c | 17 +- tools/perf/util/annotate.c | 20 + tools/perf/util/arm-spe.c | 9 + tools/perf/util/auxtrace.c | 94 +++-- tools/perf/util/auxtrace.h | 14 + tools/perf/util/bpf-event.c | 93 +++++ tools/perf/util/branch.h | 19 +- tools/perf/util/callchain.h | 8 + tools/perf/util/cap.h | 4 + tools/perf/util/cs-etm.c | 11 + tools/perf/util/dso.c | 1 + tools/perf/util/dso.h | 1 + tools/perf/util/env.h | 3 + tools/perf/util/evlist.c | 6 +- tools/perf/util/evsel.c | 35 +- tools/perf/util/evsel.h | 18 +- tools/perf/util/expr.c | 16 +- tools/perf/util/expr.h | 16 +- tools/perf/util/expr.l | 10 +- tools/perf/util/expr.y | 6 +- tools/perf/util/header.c | 108 +++++ tools/perf/util/header.h | 1 + tools/perf/util/hist.c | 23 ++ tools/perf/util/intel-bts.c | 10 + tools/perf/util/intel-pt.c | 95 ++++- tools/perf/util/machine.c | 434 ++++++++++++++++++--- tools/perf/util/metricgroup.c | 60 +-- tools/perf/util/parse-events.l | 1 + tools/perf/util/parse-events.y | 9 + tools/perf/util/pmu.c | 102 +++++ tools/perf/util/pmu.h | 9 + tools/perf/util/record.c | 62 +++ tools/perf/util/s390-cpumcf-kernel.h | 1 + tools/perf/util/s390-cpumsf.c | 11 +- tools/perf/util/sort.c | 2 +- tools/perf/util/sort.h | 2 + tools/perf/util/stat-shadow.c | 2 +- tools/perf/util/symbol.c | 1 + tools/perf/util/synthetic-events.c | 22 +- tools/perf/util/thread-stack.c | 57 +++ tools/perf/util/thread-stack.h | 3 + tools/perf/util/thread.c | 24 ++ tools/perf/util/thread.h | 15 + tools/perf/util/top.h | 1 + tools/perf/util/util.c | 1 + 85 files changed, 1851 insertions(+), 513 deletions(-) create mode 100644 tools/perf/bench/synthesize.c create mode 100755 tools/perf/scripts/python/bin/flamegraph-record create mode 100755 tools/perf/scripts/python/bin/flamegraph-report create mode 100755 tools/perf/scripts/python/flamegraph.py Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Ubuntu 19.10 is failing when linking against libllvm, which isn't the default, needs to be investigated, haven't tested with CC=gcc, but should be the same problem: + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang ... /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_ignoringImpCasts0Matcher::matches(clang::Expr const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x43): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_hasLoopVariable0Matcher::matches(clang::CXXForRangeStmt const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x48): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' ... It builds ok with the default set of options. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.7.0-rc1.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:3.11 : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0) 9 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.1 (git://git.alpinelinux.org/aports 7c78441134e54efbb34618f457d88c783c913361) (based on LLVM 9.0.1) 10 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final) 11 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1 12 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.2.1 20190827 (ALT Sisyphus 9.2.1-alt2), clang version 7.0.1 13 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 14 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 15 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 16 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 17 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 18 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 19 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 20 centos:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), clang version 8.0.1 (Red Hat 8.0.1-1.module_el8.1.0+215+a01033fb) 21 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20200214 gcc_9_2_0_release-615-g7866f9ebf1, clang version 9.0.1 22 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 23 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 24 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 25 debian:experimental : Ok gcc (Debian 9.2.1-28) 9.2.1 20200203, clang version 8.0.1-7 (tags/RELEASE_801/final) 26 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 27 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 28 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.2.1-24) 9.2.1 20200117 29 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 30 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 31 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 32 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 33 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 34 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 35 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 36 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 37 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 38 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 39 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 40 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 41 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 42 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 43 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.1 (Fedora 9.0.1-2.fc31) 44 fedora:32 : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.1.rc2.fc32) 45 fedora:rawhide : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.3.rc2.fc33) 46 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 47 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 48 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 49 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 50 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 51 openmandriva:cooker : Ok gcc (GCC) 10.0.0 20200216 (OpenMandriva), clang version 10.0.0 52 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 53 opensuse:15.1 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 54 opensuse:15.2 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 55 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 56 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20200128 [revision 83f65674e78d97d27537361de1a9d74067ff228d], clang version 9.0.1 57 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 58 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 59 oraclelinux:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4.5.0.5), clang version 8.0.1 (Red Hat 8.0.1-1.0.1.module+el8.1.0+5428+345cee14) 60 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 61 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 62 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 63 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 66 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 67 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 68 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 69 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 70 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 76 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 77 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 78 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 79 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 80 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 81 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 82 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 83 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 84 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 85 ubuntu:19.10 : FAIL gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) 86 ubuntu:20.04 : Ok gcc (Ubuntu 9.3.0-8ubuntu1) 9.3.0, clang version 10.0.0-1ubuntu1 # # uname -a Linux five 5.5.17-200.fc31.x86_64 #1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 12e89e65f446 perf hist: Add fast path for duplicate entries check # perf version --build-options perf version 5.7.rc1.g12e89e65f446 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: PMU events : Ok 11: DSO data read : Ok 12: DSO data cache : Ok 13: DSO data reopen : Ok 14: Roundtrip evsel->name : Ok 15: Parse sched tracepoints fields : Ok 16: syscalls:sys_enter_openat event fields : Ok 17: Setup struct perf_event_attr : Ok 18: Match and link multiple hists : Ok 19: 'import perf' in python : Ok 20: Breakpoint overflow signal handler : Ok 21: Breakpoint overflow sampling : Ok 22: Breakpoint accounting : Ok 23: Watchpoint : 23.1: Read Only Watchpoint : Skip 23.2: Write Only Watchpoint : Ok 23.3: Read / Write Watchpoint : Ok 23.4: Modify Watchpoint : Ok 24: Number of exit events of a simple workload : Ok 25: Software clock events period values : Ok 26: Object code reading : Ok 27: Sample parsing : Ok 28: Use a dummy software event to keep tracking : Ok 29: Parse with no sample_id_all bit set : Ok 30: Filter hist entries : Ok 31: Lookup mmap thread : Ok 32: Share thread maps : Ok 33: Sort output of hist entries : Ok 34: Cumulate child hist entries : Ok 35: Track with sched_switch : Ok 36: Filter fds with revents mask in a fdarray : Ok 37: Add fd to a fdarray, making it autogrow : Ok 38: kmod_path__parse : Ok 39: Thread map : Ok 40: LLVM search and compile : 40.1: Basic BPF llvm compile : Ok 40.2: kbuild searching : Ok 40.3: Compile source for BPF prologue generation : Ok 40.4: Compile source for BPF relocation : Ok 41: Session topology : Ok 42: BPF filter : 42.1: Basic BPF filtering : Ok 42.2: BPF pinning : Ok 42.3: BPF prologue generation : Ok 42.4: BPF relocation checker : Ok 43: Synthesize thread map : Ok 44: Remove thread map : Ok 45: Synthesize cpu map : Ok 46: Synthesize stat config : Ok 47: Synthesize stat : Ok 48: Synthesize stat round : Ok 49: Synthesize attr update : Ok 50: Event times : Ok 51: Read backward ring buffer : Ok 52: Print cpu map : Ok 53: Merge cpu map : Ok 54: Probe SDT events : Ok 55: is_printable_array : Ok 56: Print bitmap : Ok 57: perf hooks : Ok 58: builtin clang support : Skip (not compiled in) 59: unit_number__scnprintf : Ok 60: mem2node : Ok 61: time utils : Ok 62: Test jit_write_elf : Ok 63: maps__merge_in : Ok 64: x86 rdpmc : Ok 65: Convert perf time to TSC : Ok 66: DWARF unwind : Ok 67: x86 instruction decoder - new instructions : Ok 68: Intel PT packet decoder : Ok 69: x86 bp modify : Ok 70: probe libc's inet_pton & backtrace it with ping : Ok 71: Use vfs_getname probe to get syscall args filenames : Ok 72: Check open filename arg using perf trace + vfs_getname: Ok 73: Zstd perf.data compression/decompression : Ok 74: Add vfs_getname probe to get syscall args filenames : Ok # $ git log --oneline -1 ; make -C tools/perf build-test 12e89e65f446 (HEAD -> perf/core, five/perf/core) perf hist: Add fast path for duplicate entries check make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_libbpf_O: make NO_LIBBPF=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_newt_O: make NO_NEWT=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_slang_O: make NO_SLANG=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_demangle_O: make NO_DEMANGLE=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_perf_o_O: make perf.o make_cscope_O: make cscope make_no_libunwind_O: make NO_LIBUNWIND=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_util_pmu_bison_o_O: make util/pmu-bison.o make_clean_all_O: make clean all make_no_libnuma_O: make NO_LIBNUMA=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_gtk2_O: make NO_GTK2=1 make_tags_O: make tags make_no_libaudit_O: make NO_LIBAUDIT=1 make_help_O: make help make_no_libperl_O: make NO_LIBPERL=1 make_install_O: make install make_no_libelf_O: make NO_LIBELF=1 make_pure_O: make make_install_bin_O: make install-bin make_debug_O: make DEBUG=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_doc_O: make doc make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_util_map_o_O: make util/map.o make_no_libbionic_O: make NO_LIBBIONIC=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-04-20 11:52 Arnaldo Carvalho de Melo @ 2020-04-22 12:09 ` Ingo Molnar 2020-04-23 21:28 ` Daniel Díaz 0 siblings, 1 reply; 133+ messages in thread From: Ingo Molnar @ 2020-04-22 12:09 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexey Budankov, Andreas Gerstmayr, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit cd0943357bc7570f081701d005318c20982178b8: > > Merge tag 'perf-urgent-for-mingo-5.7-20200414' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2020-04-16 10:21:31 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.8-20200420 > > for you to fetch changes up to 12e89e65f446476951f42aedeef56b6bd6f7f1e6: > > perf hist: Add fast path for duplicate entries check (2020-04-18 09:05:01 -0300) > 85 files changed, 1851 insertions(+), 513 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-04-22 12:09 ` Ingo Molnar @ 2020-04-23 21:28 ` Daniel Díaz 2020-04-24 13:07 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 133+ messages in thread From: Daniel Díaz @ 2020-04-23 21:28 UTC (permalink / raw) To: Ingo Molnar Cc: Arnaldo Carvalho de Melo, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Adrian Hunter, Alexey Budankov, Andreas Gerstmayr, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, Arnaldo Carvalho de Melo, lkft-triage Hello! On Wed, 22 Apr 2020 at 07:09, Ingo Molnar <mingo@kernel.org> wrote: > * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > > Hi Ingo/Thomas, > > > > Please consider pulling, > > > > Best regards, > > > > - Arnaldo > > > > Test results at the end of this message, as usual. > > > > The following changes since commit cd0943357bc7570f081701d005318c20982178b8: > > > > Merge tag 'perf-urgent-for-mingo-5.7-20200414' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2020-04-16 10:21:31 +0200) > > > > are available in the Git repository at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.8-20200420 > > > > for you to fetch changes up to 12e89e65f446476951f42aedeef56b6bd6f7f1e6: > > > > perf hist: Add fast path for duplicate entries check (2020-04-18 09:05:01 -0300) > > > 85 files changed, 1851 insertions(+), 513 deletions(-) > > Pulled, thanks a lot Arnaldo! Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf script: Add flamegraph.py script"): ERROR: perf-1.0-r9 do_package_qa: QA Issue: /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained in package perf-python requires /usr/bin/sh, but no providers found in RDEPENDS_perf-python? [file-rdeps] This means that there is a new binary pulled in in the shebang line which was unaccounted for: `/usr/bin/sh`. I don't see any other usage of /usr/bin/sh in the kernel tree (does not even exist on my Ubuntu dev machine) but plenty of /bin/sh. This patch is needed: -----8<----------8<----------8<----- diff --git a/tools/perf/scripts/python/bin/flamegraph-record b/tools/perf/scripts/python/bin/flamegraph-record index 725d66e71570..a2f3fa25ef81 100755 --- a/tools/perf/scripts/python/bin/flamegraph-record +++ b/tools/perf/scripts/python/bin/flamegraph-record @@ -1,2 +1,2 @@ -#!/usr/bin/sh +#!/bin/sh perf record -g "$@" diff --git a/tools/perf/scripts/python/bin/flamegraph-report b/tools/perf/scripts/python/bin/flamegraph-report index b1a79afd903b..b0177355619b 100755 --- a/tools/perf/scripts/python/bin/flamegraph-report +++ b/tools/perf/scripts/python/bin/flamegraph-report @@ -1,3 +1,3 @@ -#!/usr/bin/sh +#!/bin/sh # description: create flame graphs perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" ----->8---------->8---------->8----- Greetings! Daniel Díaz daniel.diaz@linaro.org ^ permalink raw reply related [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-04-23 21:28 ` Daniel Díaz @ 2020-04-24 13:07 ` Arnaldo Carvalho de Melo 2020-04-24 14:10 ` Andreas Gerstmayr 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-04-24 13:07 UTC (permalink / raw) To: Andreas Gerstmayr, Daniel Díaz Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Adrian Hunter, Alexey Budankov, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, Arnaldo Carvalho de Melo, lkft-triage Em Thu, Apr 23, 2020 at 04:28:46PM -0500, Daniel Díaz escreveu: > On Wed, 22 Apr 2020 at 07:09, Ingo Molnar <mingo@kernel.org> wrote: > > > 85 files changed, 1851 insertions(+), 513 deletions(-) > > Pulled, thanks a lot Arnaldo! > Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf > script: Add flamegraph.py script"): > ERROR: perf-1.0-r9 do_package_qa: QA Issue: > /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained > in package perf-python requires /usr/bin/sh, but no providers found in > RDEPENDS_perf-python? [file-rdeps] yeah, the flamegraph scripts are the outliers, there, everything else is using /bin/bash, so I'll switch to that, ok Andreas? [acme@quaco perf]$ vim tools/perf/scripts/python/bin/* 34 files to edit [acme@quaco perf]$ head -1 tools/perf/scripts/python/bin/* ==> tools/perf/scripts/python/bin/compaction-times-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/compaction-times-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/event_analyzing_sample-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/event_analyzing_sample-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/export-to-postgresql-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/export-to-postgresql-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/export-to-sqlite-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/export-to-sqlite-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/failed-syscalls-by-pid-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/failed-syscalls-by-pid-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/flamegraph-record <== #!/usr/bin/sh ==> tools/perf/scripts/python/bin/flamegraph-report <== #!/usr/bin/sh ==> tools/perf/scripts/python/bin/futex-contention-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/futex-contention-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/intel-pt-events-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/intel-pt-events-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/mem-phys-addr-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/mem-phys-addr-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/netdev-times-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/netdev-times-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/net_dropmonitor-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/net_dropmonitor-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/powerpc-hcalls-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/powerpc-hcalls-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/sched-migration-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/sched-migration-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/sctop-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/sctop-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/stackcollapse-record <== #!/bin/sh ==> tools/perf/scripts/python/bin/stackcollapse-report <== #!/bin/sh ==> tools/perf/scripts/python/bin/syscall-counts-by-pid-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/syscall-counts-by-pid-report <== #!/bin/bash ==> tools/perf/scripts/python/bin/syscall-counts-record <== #!/bin/bash ==> tools/perf/scripts/python/bin/syscall-counts-report <== #!/bin/bash [acme@quaco perf]$ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-04-24 13:07 ` Arnaldo Carvalho de Melo @ 2020-04-24 14:10 ` Andreas Gerstmayr 2020-05-04 19:07 ` Daniel Díaz 0 siblings, 1 reply; 133+ messages in thread From: Andreas Gerstmayr @ 2020-04-24 14:10 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, Daniel Díaz Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Adrian Hunter, Alexey Budankov, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, Arnaldo Carvalho de Melo, lkft-triage On 24.04.20 15:07, Arnaldo Carvalho de Melo wrote: > Em Thu, Apr 23, 2020 at 04:28:46PM -0500, Daniel Díaz escreveu: >> On Wed, 22 Apr 2020 at 07:09, Ingo Molnar <mingo@kernel.org> wrote: >>>> 85 files changed, 1851 insertions(+), 513 deletions(-) > >>> Pulled, thanks a lot Arnaldo! > >> Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf >> script: Add flamegraph.py script"): >> ERROR: perf-1.0-r9 do_package_qa: QA Issue: >> /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained >> in package perf-python requires /usr/bin/sh, but no providers found in >> RDEPENDS_perf-python? [file-rdeps] > > > yeah, the flamegraph scripts are the outliers, there, everything else is > using /bin/bash, so I'll switch to that, ok Andreas? Sure, no problem. Thanks! Cheers, Andreas ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-04-24 14:10 ` Andreas Gerstmayr @ 2020-05-04 19:07 ` Daniel Díaz 2020-05-05 16:37 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 133+ messages in thread From: Daniel Díaz @ 2020-05-04 19:07 UTC (permalink / raw) To: Andreas Gerstmayr Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Adrian Hunter, Alexey Budankov, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, Arnaldo Carvalho de Melo, lkft-triage Hello! On Fri, 24 Apr 2020 at 09:10, Andreas Gerstmayr <agerstmayr@redhat.com> wrote: > > On 24.04.20 15:07, Arnaldo Carvalho de Melo wrote: > > Em Thu, Apr 23, 2020 at 04:28:46PM -0500, Daniel Díaz escreveu: > >> On Wed, 22 Apr 2020 at 07:09, Ingo Molnar <mingo@kernel.org> wrote: > >>>> 85 files changed, 1851 insertions(+), 513 deletions(-) > > > >>> Pulled, thanks a lot Arnaldo! > > > >> Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf > >> script: Add flamegraph.py script"): > >> ERROR: perf-1.0-r9 do_package_qa: QA Issue: > >> /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained > >> in package perf-python requires /usr/bin/sh, but no providers found in > >> RDEPENDS_perf-python? [file-rdeps] > > > > > > yeah, the flamegraph scripts are the outliers, there, everything else is > > using /bin/bash, so I'll switch to that, ok Andreas? > > Sure, no problem. Thanks! Just a gentle reminder that this can still be fixed in today's linux-next tree (next-20200504). Greetings! Daniel Díaz daniel.diaz@linaro.org ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-05-04 19:07 ` Daniel Díaz @ 2020-05-05 16:37 ` Arnaldo Carvalho de Melo 2020-05-05 16:57 ` Daniel Díaz 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-05-05 16:37 UTC (permalink / raw) To: Daniel Díaz Cc: Andreas Gerstmayr, Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Adrian Hunter, Alexey Budankov, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, lkft-triage Em Mon, May 04, 2020 at 02:07:56PM -0500, Daniel Díaz escreveu: > Hello! > > On Fri, 24 Apr 2020 at 09:10, Andreas Gerstmayr <agerstmayr@redhat.com> wrote: > > > > On 24.04.20 15:07, Arnaldo Carvalho de Melo wrote: > > > Em Thu, Apr 23, 2020 at 04:28:46PM -0500, Daniel Díaz escreveu: > > >> On Wed, 22 Apr 2020 at 07:09, Ingo Molnar <mingo@kernel.org> wrote: > > >>>> 85 files changed, 1851 insertions(+), 513 deletions(-) > > > > > >>> Pulled, thanks a lot Arnaldo! > > > > > >> Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf > > >> script: Add flamegraph.py script"): > > >> ERROR: perf-1.0-r9 do_package_qa: QA Issue: > > >> /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained > > >> in package perf-python requires /usr/bin/sh, but no providers found in > > >> RDEPENDS_perf-python? [file-rdeps] > > > > > > > > > yeah, the flamegraph scripts are the outliers, there, everything else is > > > using /bin/bash, so I'll switch to that, ok Andreas? > > > > Sure, no problem. Thanks! > > Just a gentle reminder that this can still be fixed in today's > linux-next tree (next-20200504). Thanks for the reminder, I've just added this to my tree: commit c74ab13a30d3bec443c116e25b611255c58f32c0 Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Tue May 5 13:33:12 2020 -0300 perf flamegraph: Use /bin/bash for report script As all the other tools/perf/scripts/python/bin/*-report scripts, fixing the this problem reported by Daniel Diaz: Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf script: Add flamegraph.py script"): ERROR: perf-1.0-r9 do_package_qa: QA Issue: /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained in package perf-python requires /usr/bin/sh, but no providers found in RDEPENDS_perf-python? [file-rdeps] This means that there is a new binary pulled in in the shebang line which was unaccounted for: `/usr/bin/sh`. I don't see any other usage of /usr/bin/sh in the kernel tree (does not even exist on my Ubuntu dev machine) but plenty of /bin/sh. This patch is needed: -----8<----------8<----------8<----- diff --git a/tools/perf/scripts/python/bin/flamegraph-record b/tools/perf/scripts/python/bin/flamegraph-record index 725d66e71570..a2f3fa25ef81 100755 --- a/tools/perf/scripts/python/bin/flamegraph-record +++ b/tools/perf/scripts/python/bin/flamegraph-record @@ -1,2 +1,2 @@ -#!/usr/bin/sh +#!/bin/sh perf record -g "$@" diff --git a/tools/perf/scripts/python/bin/flamegraph-report b/tools/perf/scripts/python/bin/flamegraph-report index b1a79afd903b..b0177355619b 100755 --- a/tools/perf/scripts/python/bin/flamegraph-report +++ b/tools/perf/scripts/python/bin/flamegraph-report @@ -1,3 +1,3 @@ -#!/usr/bin/sh +#!/bin/sh # description: create flame graphs perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" ----->8---------->8---------->8----- Fixes: 5287f9269206 ("perf script: Add flamegraph.py script") Reported-by: Daniel Díaz <daniel.diaz@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andreas Gerstmayr <agerstmayr@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: lkft-triage@lists.linaro.org Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/CAEUSe7_wmKS361mKLTB1eYbzYXcKkXdU26BX5BojdKRz8MfPCw@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> diff --git a/tools/perf/scripts/python/bin/flamegraph-report b/tools/perf/scripts/python/bin/flamegraph-report index b1a79afd903b..53c5dc90c87e 100755 --- a/tools/perf/scripts/python/bin/flamegraph-report +++ b/tools/perf/scripts/python/bin/flamegraph-report @@ -1,3 +1,3 @@ -#!/usr/bin/sh +#!/bin/bash # description: create flame graphs perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" ^ permalink raw reply related [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-05-05 16:37 ` Arnaldo Carvalho de Melo @ 2020-05-05 16:57 ` Daniel Díaz 2020-05-05 17:03 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 133+ messages in thread From: Daniel Díaz @ 2020-05-05 16:57 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Andreas Gerstmayr, Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Adrian Hunter, Alexey Budankov, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, lkft-triage Hello! On Tue, 5 May 2020 at 11:37, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote: > > Em Mon, May 04, 2020 at 02:07:56PM -0500, Daniel Díaz escreveu: > > Hello! > > > > On Fri, 24 Apr 2020 at 09:10, Andreas Gerstmayr <agerstmayr@redhat.com> wrote: > > > > > > On 24.04.20 15:07, Arnaldo Carvalho de Melo wrote: > > > > Em Thu, Apr 23, 2020 at 04:28:46PM -0500, Daniel Díaz escreveu: > > > >> On Wed, 22 Apr 2020 at 07:09, Ingo Molnar <mingo@kernel.org> wrote: > > > >>>> 85 files changed, 1851 insertions(+), 513 deletions(-) > > > > > > > >>> Pulled, thanks a lot Arnaldo! > > > > > > > >> Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf > > > >> script: Add flamegraph.py script"): > > > >> ERROR: perf-1.0-r9 do_package_qa: QA Issue: > > > >> /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained > > > >> in package perf-python requires /usr/bin/sh, but no providers found in > > > >> RDEPENDS_perf-python? [file-rdeps] > > > > > > > > > > > > yeah, the flamegraph scripts are the outliers, there, everything else is > > > > using /bin/bash, so I'll switch to that, ok Andreas? > > > > > > Sure, no problem. Thanks! > > > > Just a gentle reminder that this can still be fixed in today's > > linux-next tree (next-20200504). > > Thanks for the reminder, I've just added this to my tree: > > commit c74ab13a30d3bec443c116e25b611255c58f32c0 > Author: Arnaldo Carvalho de Melo <acme@redhat.com> > Date: Tue May 5 13:33:12 2020 -0300 > > perf flamegraph: Use /bin/bash for report script > > As all the other tools/perf/scripts/python/bin/*-report scripts, fixing > the this problem reported by Daniel Diaz: > > Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf > script: Add flamegraph.py script"): > ERROR: perf-1.0-r9 do_package_qa: QA Issue: > /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained > in package perf-python requires /usr/bin/sh, but no providers found in > RDEPENDS_perf-python? [file-rdeps] > > This means that there is a new binary pulled in in the shebang line > which was unaccounted for: `/usr/bin/sh`. I don't see any other usage > of /usr/bin/sh in the kernel tree (does not even exist on my Ubuntu > dev machine) but plenty of /bin/sh. This patch is needed: > -----8<----------8<----------8<----- > diff --git a/tools/perf/scripts/python/bin/flamegraph-record > b/tools/perf/scripts/python/bin/flamegraph-record > index 725d66e71570..a2f3fa25ef81 100755 > --- a/tools/perf/scripts/python/bin/flamegraph-record > +++ b/tools/perf/scripts/python/bin/flamegraph-record > @@ -1,2 +1,2 @@ > -#!/usr/bin/sh > +#!/bin/sh > perf record -g "$@" > diff --git a/tools/perf/scripts/python/bin/flamegraph-report > b/tools/perf/scripts/python/bin/flamegraph-report > index b1a79afd903b..b0177355619b 100755 > --- a/tools/perf/scripts/python/bin/flamegraph-report > +++ b/tools/perf/scripts/python/bin/flamegraph-report > @@ -1,3 +1,3 @@ > -#!/usr/bin/sh > +#!/bin/sh > # description: create flame graphs > perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" > ----->8---------->8---------->8----- > > Fixes: 5287f9269206 ("perf script: Add flamegraph.py script") > Reported-by: Daniel Díaz <daniel.diaz@linaro.org> > Cc: Adrian Hunter <adrian.hunter@intel.com> > Cc: Andreas Gerstmayr <agerstmayr@redhat.com> > Cc: Jiri Olsa <jolsa@kernel.org> > Cc: lkft-triage@lists.linaro.org > Cc: Namhyung Kim <namhyung@kernel.org> > Link: http://lore.kernel.org/lkml/CAEUSe7_wmKS361mKLTB1eYbzYXcKkXdU26BX5BojdKRz8MfPCw@mail.gmail.com > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > diff --git a/tools/perf/scripts/python/bin/flamegraph-report b/tools/perf/scripts/python/bin/flamegraph-report > index b1a79afd903b..53c5dc90c87e 100755 > --- a/tools/perf/scripts/python/bin/flamegraph-report > +++ b/tools/perf/scripts/python/bin/flamegraph-report > @@ -1,3 +1,3 @@ > -#!/usr/bin/sh > +#!/bin/bash > # description: create flame graphs > perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" What about flamegraph-record? Thanks and greetings! Daniel Díaz daniel.diaz@linaro.org ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-05-05 16:57 ` Daniel Díaz @ 2020-05-05 17:03 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-05-05 17:03 UTC (permalink / raw) To: Daniel Díaz Cc: Arnaldo Carvalho de Melo, Andreas Gerstmayr, Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, open list, linux-perf-users, Adrian Hunter, Alexey Budankov, He Zhe, Ian Rogers, Kajol Jain, Kan Liang, Konstantin Kharlamov, Stephane Eranian, Thomas Richter, lkft-triage Em Tue, May 05, 2020 at 11:57:18AM -0500, Daniel Díaz escreveu: > Hello! > > On Tue, 5 May 2020 at 11:37, Arnaldo Carvalho de Melo > <arnaldo.melo@gmail.com> wrote: > > > > Em Mon, May 04, 2020 at 02:07:56PM -0500, Daniel Díaz escreveu: > > > Hello! > > > > > > On Fri, 24 Apr 2020 at 09:10, Andreas Gerstmayr <agerstmayr@redhat.com> wrote: > > > > > > > > On 24.04.20 15:07, Arnaldo Carvalho de Melo wrote: > > > > > Em Thu, Apr 23, 2020 at 04:28:46PM -0500, Daniel Díaz escreveu: > > > > >> On Wed, 22 Apr 2020 at 07:09, Ingo Molnar <mingo@kernel.org> wrote: > > > > >>>> 85 files changed, 1851 insertions(+), 513 deletions(-) > > > > > > > > > >>> Pulled, thanks a lot Arnaldo! > > > > > > > > > >> Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf > > > > >> script: Add flamegraph.py script"): > > > > >> ERROR: perf-1.0-r9 do_package_qa: QA Issue: > > > > >> /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained > > > > >> in package perf-python requires /usr/bin/sh, but no providers found in > > > > >> RDEPENDS_perf-python? [file-rdeps] > > > > > > > > > > > > > > > yeah, the flamegraph scripts are the outliers, there, everything else is > > > > > using /bin/bash, so I'll switch to that, ok Andreas? > > > > > > > > Sure, no problem. Thanks! > > > > > > Just a gentle reminder that this can still be fixed in today's > > > linux-next tree (next-20200504). > > > > Thanks for the reminder, I've just added this to my tree: > > > > commit c74ab13a30d3bec443c116e25b611255c58f32c0 > > Author: Arnaldo Carvalho de Melo <acme@redhat.com> > > Date: Tue May 5 13:33:12 2020 -0300 > > > > perf flamegraph: Use /bin/bash for report script > > > > As all the other tools/perf/scripts/python/bin/*-report scripts, fixing > > the this problem reported by Daniel Diaz: > > > > Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf > > script: Add flamegraph.py script"): > > ERROR: perf-1.0-r9 do_package_qa: QA Issue: > > /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained > > in package perf-python requires /usr/bin/sh, but no providers found in > > RDEPENDS_perf-python? [file-rdeps] > > > > This means that there is a new binary pulled in in the shebang line > > which was unaccounted for: `/usr/bin/sh`. I don't see any other usage > > of /usr/bin/sh in the kernel tree (does not even exist on my Ubuntu > > dev machine) but plenty of /bin/sh. This patch is needed: > > -----8<----------8<----------8<----- > > diff --git a/tools/perf/scripts/python/bin/flamegraph-record > > b/tools/perf/scripts/python/bin/flamegraph-record > > index 725d66e71570..a2f3fa25ef81 100755 > > --- a/tools/perf/scripts/python/bin/flamegraph-record > > +++ b/tools/perf/scripts/python/bin/flamegraph-record > > @@ -1,2 +1,2 @@ > > -#!/usr/bin/sh > > +#!/bin/sh > > perf record -g "$@" > > diff --git a/tools/perf/scripts/python/bin/flamegraph-report > > b/tools/perf/scripts/python/bin/flamegraph-report > > index b1a79afd903b..b0177355619b 100755 > > --- a/tools/perf/scripts/python/bin/flamegraph-report > > +++ b/tools/perf/scripts/python/bin/flamegraph-report > > @@ -1,3 +1,3 @@ > > -#!/usr/bin/sh > > +#!/bin/sh > > # description: create flame graphs > > perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" > > ----->8---------->8---------->8----- > > > > Fixes: 5287f9269206 ("perf script: Add flamegraph.py script") > > Reported-by: Daniel Díaz <daniel.diaz@linaro.org> > > Cc: Adrian Hunter <adrian.hunter@intel.com> > > Cc: Andreas Gerstmayr <agerstmayr@redhat.com> > > Cc: Jiri Olsa <jolsa@kernel.org> > > Cc: lkft-triage@lists.linaro.org > > Cc: Namhyung Kim <namhyung@kernel.org> > > Link: http://lore.kernel.org/lkml/CAEUSe7_wmKS361mKLTB1eYbzYXcKkXdU26BX5BojdKRz8MfPCw@mail.gmail.com > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > > > diff --git a/tools/perf/scripts/python/bin/flamegraph-report b/tools/perf/scripts/python/bin/flamegraph-report > > index b1a79afd903b..53c5dc90c87e 100755 > > --- a/tools/perf/scripts/python/bin/flamegraph-report > > +++ b/tools/perf/scripts/python/bin/flamegraph-report > > @@ -1,3 +1,3 @@ > > -#!/usr/bin/sh > > +#!/bin/bash > > # description: create flame graphs > > perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" > > What about flamegraph-record? oops, make that this instead: commit b3a63d0c17e6e1d23a6b44502b55f066adfd8e6a Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Tue May 5 13:33:12 2020 -0300 perf flamegraph: Use /bin/bash for report and record scripts As all the other tools/perf/scripts/python/bin/*-{report,record} scripts, fixing the this problem reported by Daniel Diaz: Our OpenEmbedded builds detected an issue with 5287f9269206 ("perf script: Add flamegraph.py script"): ERROR: perf-1.0-r9 do_package_qa: QA Issue: /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained in package perf-python requires /usr/bin/sh, but no providers found in RDEPENDS_perf-python? [file-rdeps] This means that there is a new binary pulled in in the shebang line which was unaccounted for: `/usr/bin/sh`. I don't see any other usage of /usr/bin/sh in the kernel tree (does not even exist on my Ubuntu dev machine) but plenty of /bin/sh. This patch is needed: -----8<----------8<----------8<----- diff --git a/tools/perf/scripts/python/bin/flamegraph-record b/tools/perf/scripts/python/bin/flamegraph-record index 725d66e71570..a2f3fa25ef81 100755 --- a/tools/perf/scripts/python/bin/flamegraph-record +++ b/tools/perf/scripts/python/bin/flamegraph-record @@ -1,2 +1,2 @@ -#!/usr/bin/sh +#!/bin/sh perf record -g "$@" diff --git a/tools/perf/scripts/python/bin/flamegraph-report b/tools/perf/scripts/python/bin/flamegraph-report index b1a79afd903b..b0177355619b 100755 --- a/tools/perf/scripts/python/bin/flamegraph-report +++ b/tools/perf/scripts/python/bin/flamegraph-report @@ -1,3 +1,3 @@ -#!/usr/bin/sh +#!/bin/sh # description: create flame graphs perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" ----->8---------->8---------->8----- Fixes: 5287f9269206 ("perf script: Add flamegraph.py script") Reported-by: Daniel Díaz <daniel.diaz@linaro.org> Acked-by: Andreas Gerstmayr <agerstmayr@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: lkft-triage@lists.linaro.org Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/CAEUSe7_wmKS361mKLTB1eYbzYXcKkXdU26BX5BojdKRz8MfPCw@mail.gmail.com Link: http://lore.kernel.org/lkml/20200505163745.GD3777@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> diff --git a/tools/perf/scripts/python/bin/flamegraph-record b/tools/perf/scripts/python/bin/flamegraph-record index 725d66e71570..7df5a19c0163 100755 --- a/tools/perf/scripts/python/bin/flamegraph-record +++ b/tools/perf/scripts/python/bin/flamegraph-record @@ -1,2 +1,2 @@ -#!/usr/bin/sh +#!/bin/bash perf record -g "$@" diff --git a/tools/perf/scripts/python/bin/flamegraph-report b/tools/perf/scripts/python/bin/flamegraph-report index b1a79afd903b..53c5dc90c87e 100755 --- a/tools/perf/scripts/python/bin/flamegraph-report +++ b/tools/perf/scripts/python/bin/flamegraph-report @@ -1,3 +1,3 @@ -#!/usr/bin/sh +#!/bin/bash # description: create flame graphs perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@" ^ permalink raw reply related [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2020-03-25 12:41 Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-03-25 12:41 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Christophe JAILLET, David Laight, Ian Rogers, Jin Yao, John Garry, Kajol Jain, Leo Yan, Mike Leach, Naveen N . Rao, Ravi Bangoria, Vijay Thakkar, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 3442a9ecb8e72a33c28a2b969b766c659830e410: perf/x86/intel/uncore: Factor out __snr_uncore_mmio_init_box (2020-03-20 13:06:23 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.7-20200325 for you to fetch changes up to 0d33b34352531ff7029c58eda2321340c0ea3f5f: perf dso: Fix dso comparison (2020-03-24 10:57:38 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf report/top: Jin Yao: - Support annotation of unresolved symbols, just using its addresses. - Print addr_location.al_addr when finding a map but not a symbol, so that we have the address relative to the map which is what objdump produces, then we can match the output of perf and objdump for such unresolved addresses. - Allow sorting by non-group leaders when working with multiple events, be it in a explicit group, i.e. an event list surrounded by {} (e.g. 'perf record -e '{cycles,instructions,cache-misses}', or without, using --group in 'perf report', e.g.: perf record -e cycles,instructions,cache-misses perf report --group --group-sort-idx 1 That '1' will ask for the output to be sorted by 'instructions', not the default 'cycles'. - Add hotkeys to interactively resort the output when using multiple events, '0', '1', ... '9' to resort by the nth event, just like when using --group-sort-idx, as explained above. perf stat: Jin Yao: - Align the output for interval aggregation mode. event parsing: Ian Rogers: - Fix 3 use after frees found with clang ASAN. perf tools: Jiri Olsa: - Unify a bit the build directory output. perf tests: John Garry: - Add PMU events tests, checking that JSON files are properly parsed. perf stat: Kajol Jain: - Fix printing event names of metric group with multiple events incase of overlapping events. perf symbols: Leo Yan: - Consolidate symbol fixup issue. vendor events AMD: Vijay Thakkar: - Restrict model detection for zen1 based processors - Add Zen2 events. - Update Zen1 events to V2. perf cpumap: Christophe JAILLET: - Fix snprintf overflow check DSOs: Ravi Bangoria: - Fix dso comparison wrt IDs (maj, min, etc), that had made 'perf archive' stop working when build-ids were not being collected. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Arnaldo Carvalho de Melo (1): tools headers uapi: Update linux/in.h copy Christophe JAILLET (1): perf cpumap: Fix snprintf overflow check Ian Rogers (1): perf parse-events: Fix 3 use after frees found with clang ASAN Jin Yao (7): perf report: Print al_addr when symbol is not found perf report: Support interactive annotation of code without symbols perf report/top TUI: Support hotkey 'a' for annotation of unresolved addresses perf report: Allow specifying event to be used as sort key in --group output perf report: Support a new key to reload the browser perf report/top TUI: Support hotkeys to let user select any event for sorting perf stat: Align the output for interval aggregation mode Jiri Olsa (1): perf tools: Unify a bit the build directory output John Garry (7): perf jevents: Add some test events perf jevents: Support test events folder perf pmu: Refactor pmu_add_cpu_aliases() perf test: Add pmu-events test perf pmu: Add is_pmu_core() perf pmu: Make pmu_uncore_alias_match() public perf test: Test pmu-events aliases Kajol Jain (1): perf metricgroup: Fix printing event names of metric group with multiple events incase of overlapping events Leo Yan (1): perf symbols: Consolidate symbol fixup issue Ravi Bangoria (1): perf dso: Fix dso comparison Vijay Thakkar (3): perf vendor events amd: Restrict model detection for zen1 based processors perf vendor events amd: Add Zen2 events perf vendor events amd: Update Zen1 events to V2 tools/include/uapi/linux/in.h | 2 + tools/perf/Documentation/perf-report.txt | 5 + tools/perf/Makefile.perf | 9 +- tools/perf/arch/arm64/util/Build | 1 - tools/perf/arch/arm64/util/sym-handling.c | 19 -- tools/perf/arch/powerpc/util/Build | 1 - tools/perf/arch/powerpc/util/sym-handling.c | 10 - tools/perf/builtin-report.c | 16 +- .../{x86/amdfam17h => test/test_cpu}/branch.json | 0 .../perf/pmu-events/arch/test/test_cpu/other.json | 26 ++ .../perf/pmu-events/arch/test/test_cpu/uncore.json | 21 ++ .../perf/pmu-events/arch/x86/amdfam17h/cache.json | 329 ------------------ .../perf/pmu-events/arch/x86/amdfam17h/other.json | 65 ---- tools/perf/pmu-events/arch/x86/amdzen1/branch.json | 23 ++ tools/perf/pmu-events/arch/x86/amdzen1/cache.json | 294 ++++++++++++++++ .../arch/x86/{amdfam17h => amdzen1}/core.json | 15 +- .../x86/{amdfam17h => amdzen1}/floating-point.json | 64 +++- .../arch/x86/{amdfam17h => amdzen1}/memory.json | 82 +++-- tools/perf/pmu-events/arch/x86/amdzen1/other.json | 56 +++ tools/perf/pmu-events/arch/x86/amdzen2/branch.json | 52 +++ tools/perf/pmu-events/arch/x86/amdzen2/cache.json | 338 ++++++++++++++++++ tools/perf/pmu-events/arch/x86/amdzen2/core.json | 130 +++++++ .../arch/x86/amdzen2/floating-point.json | 140 ++++++++ tools/perf/pmu-events/arch/x86/amdzen2/memory.json | 341 ++++++++++++++++++ tools/perf/pmu-events/arch/x86/amdzen2/other.json | 115 +++++++ tools/perf/pmu-events/arch/x86/mapfile.csv | 3 +- tools/perf/pmu-events/jevents.c | 30 ++ tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 4 + tools/perf/tests/pmu-events.c | 379 +++++++++++++++++++++ tools/perf/tests/tests.h | 1 + tools/perf/ui/browsers/hists.c | 118 ++++++- tools/perf/ui/hist.c | 93 ++++- tools/perf/ui/keysyms.h | 1 + tools/perf/util/annotate.h | 1 + tools/perf/util/cpumap.c | 10 +- tools/perf/util/dsos.c | 22 +- tools/perf/util/evsel.c | 1 + tools/perf/util/hist.h | 1 + tools/perf/util/metricgroup.c | 49 +-- tools/perf/util/parse-events.c | 6 +- tools/perf/util/pmu.c | 28 +- tools/perf/util/pmu.h | 5 + tools/perf/util/sort.c | 6 +- tools/perf/util/stat-display.c | 6 +- tools/perf/util/symbol-elf.c | 10 +- tools/perf/util/symbol_conf.h | 1 + 47 files changed, 2374 insertions(+), 556 deletions(-) delete mode 100644 tools/perf/arch/arm64/util/sym-handling.c rename tools/perf/pmu-events/arch/{x86/amdfam17h => test/test_cpu}/branch.json (100%) create mode 100644 tools/perf/pmu-events/arch/test/test_cpu/other.json create mode 100644 tools/perf/pmu-events/arch/test/test_cpu/uncore.json delete mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/cache.json delete mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/other.json create mode 100644 tools/perf/pmu-events/arch/x86/amdzen1/branch.json create mode 100644 tools/perf/pmu-events/arch/x86/amdzen1/cache.json rename tools/perf/pmu-events/arch/x86/{amdfam17h => amdzen1}/core.json (87%) rename tools/perf/pmu-events/arch/x86/{amdfam17h => amdzen1}/floating-point.json (61%) rename tools/perf/pmu-events/arch/x86/{amdfam17h => amdzen1}/memory.json (63%) create mode 100644 tools/perf/pmu-events/arch/x86/amdzen1/other.json create mode 100644 tools/perf/pmu-events/arch/x86/amdzen2/branch.json create mode 100644 tools/perf/pmu-events/arch/x86/amdzen2/cache.json create mode 100644 tools/perf/pmu-events/arch/x86/amdzen2/core.json create mode 100644 tools/perf/pmu-events/arch/x86/amdzen2/floating-point.json create mode 100644 tools/perf/pmu-events/arch/x86/amdzen2/memory.json create mode 100644 tools/perf/pmu-events/arch/x86/amdzen2/other.json create mode 100644 tools/perf/tests/pmu-events.c Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Ubuntu 19.10 is failing when linking against libllvm, which isn't the default, needs to be investigated, haven't tested with CC=gcc, but should be the same problem: + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang ... /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_ignoringImpCasts0Matcher::matches(clang::Expr const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x43): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_hasLoopVariable0Matcher::matches(clang::CXXForRangeStmt const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x48): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' ... It builds ok with the default set of options. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.6.0-rc6.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:3.11 : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0) 9 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.1 (git://git.alpinelinux.org/aports 7c78441134e54efbb34618f457d88c783c913361) (based on LLVM 9.0.1) 10 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final) 11 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1 12 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.2.1 20190827 (ALT Sisyphus 9.2.1-alt2), clang version 7.0.1 13 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 14 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 15 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 16 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 17 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 18 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 19 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 20 centos:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), clang version 8.0.1 (Red Hat 8.0.1-1.module_el8.1.0+215+a01033fb) 21 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20200214 gcc_9_2_0_release-615-g7866f9ebf1, clang version 9.0.1 22 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 23 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 24 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 25 debian:experimental : Ok gcc (Debian 9.2.1-28) 9.2.1 20200203, clang version 8.0.1-7 (tags/RELEASE_801/final) 26 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 27 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 28 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.2.1-24) 9.2.1 20200117 29 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 30 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 31 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 32 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 33 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 34 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 35 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 36 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 37 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 38 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 39 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 40 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 41 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 42 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 43 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.1 (Fedora 9.0.1-2.fc31) 44 fedora:32 : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.1.rc2.fc32) 45 fedora:rawhide : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.3.rc2.fc33) 46 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 47 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 48 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 49 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 50 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 51 openmandriva:cooker : Ok gcc (GCC) 10.0.0 20200216 (OpenMandriva), clang version 10.0.0 52 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 53 opensuse:15.1 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 54 opensuse:15.2 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 55 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 56 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20200128 [revision 83f65674e78d97d27537361de1a9d74067ff228d], clang version 9.0.1 57 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 58 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 59 oraclelinux:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4.5.0.5), clang version 8.0.1 (Red Hat 8.0.1-1.0.1.module+el8.1.0+5428+345cee14) 60 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 61 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 62 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 63 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 66 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 67 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 68 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 69 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 70 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 76 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 77 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 78 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 79 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 80 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 81 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 82 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 83 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 84 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 85 ubuntu:19.10 : FAIL gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # # uname -a Linux five 5.5.10-200.fc31.x86_64 #1 SMP Wed Mar 18 14:21:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 0d33b3435253 perf dso: Fix dso comparison # perf version --build-options perf version 5.6.rc6.g9a13a0215c8d dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: PMU events : Ok 11: DSO data read : Ok 12: DSO data cache : Ok 13: DSO data reopen : Ok 14: Roundtrip evsel->name : Ok 15: Parse sched tracepoints fields : Ok 16: syscalls:sys_enter_openat event fields : Ok 17: Setup struct perf_event_attr : Ok 18: Match and link multiple hists : Ok 19: 'import perf' in python : Ok 20: Breakpoint overflow signal handler : Ok 21: Breakpoint overflow sampling : Ok 22: Breakpoint accounting : Ok 23: Watchpoint : 23.1: Read Only Watchpoint : Skip 23.2: Write Only Watchpoint : Ok 23.3: Read / Write Watchpoint : Ok 23.4: Modify Watchpoint : Ok 24: Number of exit events of a simple workload : Ok 25: Software clock events period values : Ok 26: Object code reading : Ok 27: Sample parsing : Ok 28: Use a dummy software event to keep tracking : Ok 29: Parse with no sample_id_all bit set : Ok 30: Filter hist entries : Ok 31: Lookup mmap thread : Ok 32: Share thread maps : Ok 33: Sort output of hist entries : Ok 34: Cumulate child hist entries : Ok 35: Track with sched_switch : Ok 36: Filter fds with revents mask in a fdarray : Ok 37: Add fd to a fdarray, making it autogrow : Ok 38: kmod_path__parse : Ok 39: Thread map : Ok 40: LLVM search and compile : 40.1: Basic BPF llvm compile : Ok 40.2: kbuild searching : Ok 40.3: Compile source for BPF prologue generation : Ok 40.4: Compile source for BPF relocation : Ok 41: Session topology : Ok 42: BPF filter : 42.1: Basic BPF filtering : Ok 42.2: BPF pinning : Ok 42.3: BPF prologue generation : Ok 42.4: BPF relocation checker : Ok 43: Synthesize thread map : Ok 44: Remove thread map : Ok 45: Synthesize cpu map : Ok 46: Synthesize stat config : Ok 47: Synthesize stat : Ok 48: Synthesize stat round : Ok 49: Synthesize attr update : Ok 50: Event times : Ok 51: Read backward ring buffer : Ok 52: Print cpu map : Ok 53: Merge cpu map : Ok 54: Probe SDT events : Ok 55: is_printable_array : Ok 56: Print bitmap : Ok 57: perf hooks : Ok 58: builtin clang support : Skip (not compiled in) 59: unit_number__scnprintf : Ok 60: mem2node : Ok 61: time utils : Ok 62: Test jit_write_elf : Ok 63: maps__merge_in : Ok 64: x86 rdpmc : Ok 65: Convert perf time to TSC : Ok 66: DWARF unwind : Ok 67: x86 instruction decoder - new instructions : Ok 68: Intel PT packet decoder : Ok 69: x86 bp modify : Ok 70: probe libc's inet_pton & backtrace it with ping : Ok 71: Use vfs_getname probe to get syscall args filenames : Ok 72: Check open filename arg using perf trace + vfs_getname: Ok 73: Zstd perf.data compression/decompression : Ok 74: Add vfs_getname probe to get syscall args filenames : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_tags_O: make tags make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_libperl_O: make NO_LIBPERL=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libelf_O: make NO_LIBELF=1 make_debug_O: make DEBUG=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_newt_O: make NO_NEWT=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_install_bin_O: make install-bin make_help_O: make help make_pure_O: make make_no_slang_O: make NO_SLANG=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_gtk2_O: make NO_GTK2=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_demangle_O: make NO_DEMANGLE=1 make_util_map_o_O: make util/map.o make_no_backtrace_O: make NO_BACKTRACE=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_doc_O: make doc make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libbpf_O: make NO_LIBBPF=1 make_perf_o_O: make perf.o make_clean_all_O: make clean all make_install_O: make install make_with_babeltrace_O: make LIBBABELTRACE=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2020-03-17 21:32 Arnaldo Carvalho de Melo 2020-03-19 14:03 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-03-17 21:32 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexey Budankov, Andi Kleen, disconnect3d, Ian Rogers, Jin Yao, Kan Liang, Leo Yan, Michael Petlan, Mike Leach, Thomas Richter, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit f787feff69c466dfc6f261c9632627e383b49187: perf block-info: Support color ops to print block percents in color (2020-03-09 21:43:25 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.7-20200317 for you to fetch changes up to 59a08b4b3b1a9374adacd13cd7544c03e5582e0e: perf expr: Fix copy/paste mistake (2020-03-17 18:01:40 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf record: Alexey Budankov: - Fix binding of AIO user space buffers to nodes maps: Dominik b. Czarnota: - Fix off by one in strncpy() size argument. Arnaldo Carvalho de Melo: - Use strstarts() to look for Android libraries. Ian Rogers: - Give synthetic mmap events an inode generation. man pages: Ian Rogers: - Set man page date to last git commit. perf test: Ian Rogers: - Print if shell directory isn't present. perf report: Jin Yao: - Fix no branch type statistics report issue. perf expr: Jiri Olsa: - Fix copy/paste mistake vendor events: Kan Liang: - Support metric constraints. vendor events intel: Kan Liang: - Add NO_NMI_WATCHDOG metric constraint. vendor events s390: Thomas Richter: - Add new deflate counters for IBM z15. ARM cs-etm: Leo Yan: - Last branch improvements. intel-pt: Adrian Hunter: - Update intel-pt.txt file with new location of the documentation. - Add Intel PT man page references. - Rename intel-pt.txt and put it in man page format. perl scripting: Michael Petlan: - Add common_callchain to fix argument order. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (3): perf intel-pt: Rename intel-pt.txt and put it in man page format perf intel-pt: Add Intel PT man page references perf intel-pt: Update intel-pt.txt file with new location of the documentation Alexey Budankov (1): perf record: Fix binding of AIO user space buffers to nodes Arnaldo Carvalho de Melo (1): perf map: Use strstarts() to look for Android libraries Ian Rogers (3): perf doc: Set man page date to last git commit perf test: Print if shell directory isn't present perf tools: Give synthetic mmap events an inode generation Jin Yao (1): perf report: Fix no branch type statistics report issue Jiri Olsa (1): perf expr: Fix copy/paste mistake Kan Liang (5): perf jevents: Support metric constraint perf metricgroup: Factor out metricgroup__add_metric_weak_group() perf util: Factor out sysctl__nmi_watchdog_enabled() perf metricgroup: Support metric constraint perf vendor events intel: Add NO_NMI_WATCHDOG metric constraint Leo Yan (5): perf cs-etm: Swap packets for instruction samples perf cs-etm: Continuously record last branch perf cs-etm: Correct synthesizing instruction samples perf cs-etm: Optimize copying last branches perf cs-etm: Fix unsigned variable comparison to zero Michael Petlan (1): perf scripting perl: Add common_callchain to fix argument order Thomas Richter (1): perf vendor events s390: Add new deflate counters for IBM z15 disconnect3d (1): perf map: Fix off by one in strncpy() size argument tools/perf/Documentation/Makefile | 5 +- tools/perf/Documentation/intel-pt.txt | 992 +------------------ tools/perf/Documentation/perf-inject.txt | 3 +- tools/perf/Documentation/perf-intel-pt.txt | 1007 ++++++++++++++++++++ tools/perf/Documentation/perf-record.txt | 2 +- tools/perf/Documentation/perf-report.txt | 3 +- tools/perf/Documentation/perf-script.txt | 2 +- tools/perf/builtin-report.c | 9 +- .../perf/pmu-events/arch/s390/cf_z15/crypto6.json | 8 +- .../perf/pmu-events/arch/s390/cf_z15/extended.json | 30 +- .../arch/x86/cascadelakex/clx-metrics.json | 3 +- .../pmu-events/arch/x86/skylake/skl-metrics.json | 3 +- .../pmu-events/arch/x86/skylakex/skx-metrics.json | 3 +- tools/perf/pmu-events/jevents.c | 19 +- tools/perf/pmu-events/jevents.h | 2 +- tools/perf/pmu-events/pmu-events.h | 1 + tools/perf/scripts/perl/check-perf-trace.pl | 6 +- tools/perf/scripts/perl/failed-syscalls.pl | 2 +- tools/perf/scripts/perl/rw-by-file.pl | 6 +- tools/perf/scripts/perl/rw-by-pid.pl | 10 +- tools/perf/scripts/perl/rwtop.pl | 10 +- tools/perf/scripts/perl/wakeup-latency.pl | 6 +- tools/perf/tests/builtin-test.c | 5 +- tools/perf/util/cs-etm.c | 157 ++- tools/perf/util/expr.l | 4 +- tools/perf/util/map.c | 8 +- tools/perf/util/metricgroup.c | 109 ++- tools/perf/util/mmap.c | 21 +- tools/perf/util/stat-display.c | 6 +- tools/perf/util/synthetic-events.c | 1 + tools/perf/util/util.c | 18 + tools/perf/util/util.h | 2 + 32 files changed, 1340 insertions(+), 1123 deletions(-) create mode 100644 tools/perf/Documentation/perf-intel-pt.txt Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux and debian:experimental are failing when due to: `.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o Ubuntu 19.10 is failing when linking against libllvm, which isn't the default, needs to be investigated, haven't tested with CC=gcc, but should be the same problem: + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang ... /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_ignoringImpCasts0Matcher::matches(clang::Expr const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x43): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_hasLoopVariable0Matcher::matches(clang::CXXForRangeStmt const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x48): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' ... It builds ok with the default set of options. # export PERF_TARBALL=http://192.168.122.1/perf/perf-5.6.0-rc4.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:3.11 : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0) 9 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.1 (git://git.alpinelinux.org/aports 7c78441134e54efbb34618f457d88c783c913361) (based on LLVM 9.0.1) 10 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final) 11 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1 12 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.2.1 20200123 (ALT Sisyphus 9.2.1-alt3), clang version 9.0.1 13 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 14 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 15 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 16 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 17 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 18 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 19 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 20 centos:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), clang version 8.0.1 (Red Hat 8.0.1-1.module_el8.1.0+215+a01033fb) 21 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20200305 gcc_9_2_0_release-738-ge50627ff8c, clang version 9.0.1 22 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 23 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 24 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 25 debian:experimental : FAIL gcc (Debian 9.2.1-31) 9.2.1 20200306, clang version 9.0.1-9 26 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 9.2.1-28) 9.2.1 20200203 27 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 9.2.1-24) 9.2.1 20200117 28 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.2.1-24) 9.2.1 20200117 29 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 30 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 31 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 32 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 33 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 34 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 35 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 36 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 37 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 38 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 39 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 40 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 41 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 42 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.1 (Fedora 9.0.1-2.fc31) 43 fedora:32 : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.1.rc2.fc32) 44 fedora:rawhide : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.5.rc3.fc33) 45 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 46 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 47 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 48 mageia:7 : Ok gcc (Mageia 8.4.0-1.mga7) 8.4.0, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 49 manjaro:latest : Ok gcc (Arch Linux 9.2.1+20200130-2) 9.2.1 20200130, clang version 9.0.1 50 openmandriva:cooker : Ok gcc (GCC) 10.0.0 20200301 (OpenMandriva), clang version 10.0.0 51 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 5.0.1 (tags/RELEASE_501/final 312548) 52 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 53 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20200128 [revision 83f65674e78d97d27537361de1a9d74067ff228d], clang version 9.0.1 54 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 55 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 56 oraclelinux:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4.5.0.5), clang version 8.0.1 (Red Hat 8.0.1-1.0.1.module+el8.1.0+5428+345cee14) 57 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 58 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 59 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 60 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 63 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 66 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 67 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0 68 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0 69 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 70 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 71 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 72 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 73 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 74 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 75 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 76 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 77 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 78 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 79 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 80 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 81 ubuntu:19.10 : FAIL gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) $ # uname -a Linux five 5.5.8-200.fc31.x86_64 #1 SMP Thu Mar 5 21:28:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 59a08b4b3b1a perf expr: Fix copy/paste mistake # perf version --build-options perf version 5.6.rc4.g59a08b4b3b1a dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread maps : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Merge cpu map : Ok 53: Probe SDT events : Ok 54: is_printable_array : Ok 55: Print bitmap : Ok 56: perf hooks : Ok 57: builtin clang support : Skip (not compiled in) 58: unit_number__scnprintf : Ok 59: mem2node : Ok 60: time utils : Ok 61: Test jit_write_elf : Ok 62: maps__merge_in : Ok 63: x86 rdpmc : Ok 64: Convert perf time to TSC : Ok 65: DWARF unwind : Ok 66: x86 instruction decoder - new instructions : Ok 67: Intel PT packet decoder : Ok 68: x86 bp modify : Ok 69: probe libc's inet_pton & backtrace it with ping : Ok 70: Use vfs_getname probe to get syscall args filenames : Ok 71: Check open filename arg using perf trace + vfs_getname: Ok 72: Zstd perf.data compression/decompression : Ok 73: Add vfs_getname probe to get syscall args filenames : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_auxtrace_O: make NO_AUXTRACE=1 make_help_O: make help make_doc_O: make doc make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_slang_O: make NO_SLANG=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_tags_O: make tags make_no_demangle_O: make NO_DEMANGLE=1 make_install_prefix_O: make install prefix=/tmp/krava make_pure_O: make make_perf_o_O: make perf.o make_no_gtk2_O: make NO_GTK2=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_clean_all_O: make clean all make_no_newt_O: make NO_NEWT=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_debug_O: make DEBUG=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libelf_O: make NO_LIBELF=1 make_install_O: make install make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_libbpf_O: make NO_LIBBPF=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_util_map_o_O: make util/map.o make_install_bin_O: make install-bin make_no_libperl_O: make NO_LIBPERL=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_with_babeltrace_O: make LIBBABELTRACE=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-03-17 21:32 Arnaldo Carvalho de Melo @ 2020-03-19 14:03 ` Ingo Molnar 2020-03-19 14:07 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 133+ messages in thread From: Ingo Molnar @ 2020-03-19 14:03 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexey Budankov, Andi Kleen, disconnect3d, Ian Rogers, Jin Yao, Kan Liang, Leo Yan, Michael Petlan, Mike Leach, Thomas Richter, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit f787feff69c466dfc6f261c9632627e383b49187: > > perf block-info: Support color ops to print block percents in color (2020-03-09 21:43:25 -0300) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.7-20200317 > > for you to fetch changes up to 59a08b4b3b1a9374adacd13cd7544c03e5582e0e: > > perf expr: Fix copy/paste mistake (2020-03-17 18:01:40 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf record: > > Alexey Budankov: > > - Fix binding of AIO user space buffers to nodes > > maps: > > Dominik b. Czarnota: > > - Fix off by one in strncpy() size argument. > > Arnaldo Carvalho de Melo: > > - Use strstarts() to look for Android libraries. > > Ian Rogers: > > - Give synthetic mmap events an inode generation. > > man pages: > > Ian Rogers: > > - Set man page date to last git commit. > > perf test: > > Ian Rogers: > > - Print if shell directory isn't present. > > perf report: > > Jin Yao: > > - Fix no branch type statistics report issue. > > perf expr: > > Jiri Olsa: > > - Fix copy/paste mistake > > vendor events: > > Kan Liang: > > - Support metric constraints. > > vendor events intel: > > Kan Liang: > > - Add NO_NMI_WATCHDOG metric constraint. > > vendor events s390: > > Thomas Richter: > > - Add new deflate counters for IBM z15. > > ARM cs-etm: > > Leo Yan: > > - Last branch improvements. > > intel-pt: > > Adrian Hunter: > > - Update intel-pt.txt file with new location of the documentation. > > - Add Intel PT man page references. > > - Rename intel-pt.txt and put it in man page format. > > perl scripting: > > Michael Petlan: > > - Add common_callchain to fix argument order. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Adrian Hunter (3): > perf intel-pt: Rename intel-pt.txt and put it in man page format > perf intel-pt: Add Intel PT man page references > perf intel-pt: Update intel-pt.txt file with new location of the documentation > > Alexey Budankov (1): > perf record: Fix binding of AIO user space buffers to nodes > > Arnaldo Carvalho de Melo (1): > perf map: Use strstarts() to look for Android libraries > > Ian Rogers (3): > perf doc: Set man page date to last git commit > perf test: Print if shell directory isn't present > perf tools: Give synthetic mmap events an inode generation > > Jin Yao (1): > perf report: Fix no branch type statistics report issue > > Jiri Olsa (1): > perf expr: Fix copy/paste mistake > > Kan Liang (5): > perf jevents: Support metric constraint > perf metricgroup: Factor out metricgroup__add_metric_weak_group() > perf util: Factor out sysctl__nmi_watchdog_enabled() > perf metricgroup: Support metric constraint > perf vendor events intel: Add NO_NMI_WATCHDOG metric constraint > > Leo Yan (5): > perf cs-etm: Swap packets for instruction samples > perf cs-etm: Continuously record last branch > perf cs-etm: Correct synthesizing instruction samples > perf cs-etm: Optimize copying last branches > perf cs-etm: Fix unsigned variable comparison to zero > > Michael Petlan (1): > perf scripting perl: Add common_callchain to fix argument order > > Thomas Richter (1): > perf vendor events s390: Add new deflate counters for IBM z15 > > disconnect3d (1): > perf map: Fix off by one in strncpy() size argument > > tools/perf/Documentation/Makefile | 5 +- > tools/perf/Documentation/intel-pt.txt | 992 +------------------ > tools/perf/Documentation/perf-inject.txt | 3 +- > tools/perf/Documentation/perf-intel-pt.txt | 1007 ++++++++++++++++++++ > tools/perf/Documentation/perf-record.txt | 2 +- > tools/perf/Documentation/perf-report.txt | 3 +- > tools/perf/Documentation/perf-script.txt | 2 +- > tools/perf/builtin-report.c | 9 +- > .../perf/pmu-events/arch/s390/cf_z15/crypto6.json | 8 +- > .../perf/pmu-events/arch/s390/cf_z15/extended.json | 30 +- > .../arch/x86/cascadelakex/clx-metrics.json | 3 +- > .../pmu-events/arch/x86/skylake/skl-metrics.json | 3 +- > .../pmu-events/arch/x86/skylakex/skx-metrics.json | 3 +- > tools/perf/pmu-events/jevents.c | 19 +- > tools/perf/pmu-events/jevents.h | 2 +- > tools/perf/pmu-events/pmu-events.h | 1 + > tools/perf/scripts/perl/check-perf-trace.pl | 6 +- > tools/perf/scripts/perl/failed-syscalls.pl | 2 +- > tools/perf/scripts/perl/rw-by-file.pl | 6 +- > tools/perf/scripts/perl/rw-by-pid.pl | 10 +- > tools/perf/scripts/perl/rwtop.pl | 10 +- > tools/perf/scripts/perl/wakeup-latency.pl | 6 +- > tools/perf/tests/builtin-test.c | 5 +- > tools/perf/util/cs-etm.c | 157 ++- > tools/perf/util/expr.l | 4 +- > tools/perf/util/map.c | 8 +- > tools/perf/util/metricgroup.c | 109 ++- > tools/perf/util/mmap.c | 21 +- > tools/perf/util/stat-display.c | 6 +- > tools/perf/util/synthetic-events.c | 1 + > tools/perf/util/util.c | 18 + > tools/perf/util/util.h | 2 + > 32 files changed, 1340 insertions(+), 1123 deletions(-) > create mode 100644 tools/perf/Documentation/perf-intel-pt.txt Pulled this and the previous perf/core pull request into tip:perf/core, thanks Arnaldo! (You might want to double check my conflict resolution with perf/urgent, to tools/perf/util/map.c.) Thanks, Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-03-19 14:03 ` Ingo Molnar @ 2020-03-19 14:07 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-03-19 14:07 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexey Budankov, Andi Kleen, disconnect3d, Ian Rogers, Jin Yao, Kan Liang, Leo Yan, Michael Petlan, Mike Leach, Thomas Richter, Arnaldo Carvalho de Melo Em Thu, Mar 19, 2020 at 03:03:38PM +0100, Ingo Molnar escreveu: > * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > 32 files changed, 1340 insertions(+), 1123 deletions(-) > > create mode 100644 tools/perf/Documentation/perf-intel-pt.txt > Pulled this and the previous perf/core pull request into tip:perf/core, thanks Arnaldo! > (You might want to double check my conflict resolution with perf/urgent, > to tools/perf/util/map.c.) I'll check, thanks for pulling the outstanding pull reqs! - Arnaldo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2020-03-10 11:15 Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-03-10 11:15 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Jin Yao, Kan Liang, Michael Petlan, Ravi Bangoria, Steven Rostedt, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit d46eec8e975a8180e178e01ba505801c44bc9a6c: Merge remote-tracking branch 'acme/perf/urgent' into perf/core (2020-03-04 10:29:19 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.7-20200310 for you to fetch changes up to f787feff69c466dfc6f261c9632627e383b49187: perf block-info: Support color ops to print block percents in color (2020-03-09 21:43:25 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf stat: Jin Yao: - Show percore counts in per CPU output. perf report: Jin Yao: - Allow selecting which block info columns to report and its order. - Support color ops to print block percents in color. - Fix wrong block address comparison in block_info__cmp(). perf annotate: Ravi Bangoria: - Get rid of annotation->nr_jumps, unused. expr: Jiri Olsa: - Move expr lexer to flex. llvm: Arnaldo Carvalho de Melo: - Add debug hint message about missing kernel-devel package. core: Kan Liang: - Initial patches to support the recently added PERF_SAMPLE_BRANCH_HW_INDEX kernel feature. - Add check for unexpected use of reserved membrs in event attr, so that in the future older perf tools will complain instead of silently try to process unknown features. libapi: Namhyung Kim: - Adopt cgroupsfs_find_mountpoint() from tools/perf/util/. libperf: Michael Petlan: - Add counting example. libtraceevent: Steven Rostedt (VMware): - Remove extra '\n' in print_event_time(). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Arnaldo Carvalho de Melo (2): perf llvm: Add debug hint message about missing kernel-devel package tools headers UAPI: Update tools's copy of linux/perf_event.h Jin Yao (5): perf stat: Show percore counts in per CPU output perf block-info: Fix wrong block address comparison in block_info__cmp() perf diff: Use __block_info__cmp() to replace block_pair_cmp() perf block-info: Allow selecting which columns to report and its order perf block-info: Support color ops to print block percents in color Jiri Olsa (5): perf expr: Add expr.c object perf expr: Move expr lexer to flex perf expr: Increase EXPR_MAX_OTHER to support metrics with more than 15 variables perf expr: Straighten expr__parse()/expr__find_other() interface perf expr: Make expr__parse() return -1 on error Kan Liang (3): perf tools: Add hw_idx in struct branch_stack perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX perf header: Add check for unexpected use of reserved membrs in event attr Michael Petlan (1): libperf: Add counting example Namhyung Kim (1): tools lib api fs: Move cgroupsfs_find_mountpoint() Ravi Bangoria (1): perf annotate: Get rid of annotation->nr_jumps Steven Rostedt (VMware) (1): tools lib traceevent: Remove extra '\n' in print_event_time() tools/include/uapi/linux/perf_event.h | 8 +- tools/lib/api/fs/Build | 1 + tools/lib/api/fs/cgroup.c | 67 ++++++++ tools/lib/api/fs/fs.h | 2 + tools/lib/perf/Documentation/examples/counting.c | 83 +++++++++ tools/lib/traceevent/event-parse.c | 2 +- tools/perf/Documentation/perf-stat.txt | 9 + tools/perf/builtin-diff.c | 21 +-- tools/perf/builtin-report.c | 21 ++- tools/perf/builtin-script.c | 70 ++++---- tools/perf/builtin-stat.c | 4 + tools/perf/tests/expr.c | 10 +- tools/perf/tests/sample-parsing.c | 7 +- tools/perf/util/Build | 11 +- tools/perf/util/annotate.c | 2 - tools/perf/util/annotate.h | 1 - tools/perf/util/block-info.c | 106 +++++++----- tools/perf/util/block-info.h | 9 +- tools/perf/util/branch.h | 22 +++ tools/perf/util/cgroup.c | 63 +------ tools/perf/util/cs-etm.c | 2 + tools/perf/util/event.h | 1 + tools/perf/util/evsel.c | 20 ++- tools/perf/util/evsel.h | 6 + tools/perf/util/expr.c | 112 +++++++++++++ tools/perf/util/expr.h | 8 +- tools/perf/util/expr.l | 114 +++++++++++++ tools/perf/util/expr.y | 185 ++++----------------- tools/perf/util/header.c | 37 +++++ tools/perf/util/hist.c | 3 +- tools/perf/util/intel-pt.c | 2 + tools/perf/util/llvm-utils.c | 2 + tools/perf/util/machine.c | 35 ++-- tools/perf/util/perf_event_attr_fprintf.c | 1 + .../util/scripting-engines/trace-event-python.c | 30 ++-- tools/perf/util/session.c | 8 +- tools/perf/util/stat-display.c | 33 +++- tools/perf/util/stat-shadow.c | 4 +- tools/perf/util/stat.h | 1 + tools/perf/util/synthetic-events.c | 6 +- 40 files changed, 750 insertions(+), 379 deletions(-) create mode 100644 tools/lib/api/fs/cgroup.c create mode 100644 tools/lib/perf/Documentation/examples/counting.c create mode 100644 tools/perf/util/expr.c create mode 100644 tools/perf/util/expr.l Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when due to: `.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o Ubuntu 19.10 is failing when linking against libllvm, which isn't the default, needs to be investigated, haven't tested with CC=gcc, but should be the same problem: + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang ... /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_ignoringImpCasts0Matcher::matches(clang::Expr const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x43): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_hasLoopVariable0Matcher::matches(clang::CXXForRangeStmt const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x48): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' ... It builds ok with the default set of options. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.6.0-rc4.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:3.11 : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0) 9 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.1 (git://git.alpinelinux.org/aports 7c78441134e54efbb34618f457d88c783c913361) (based on LLVM 9.0.1) 10 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final) 11 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1 12 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.2.1 20190827 (ALT Sisyphus 9.2.1-alt2), clang version 7.0.1 13 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 14 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 15 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 16 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 17 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 18 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 19 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 20 centos:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), clang version 8.0.1 (Red Hat 8.0.1-1.module_el8.1.0+215+a01033fb) 21 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20200214 gcc_9_2_0_release-615-g7866f9ebf1, clang version 9.0.1 22 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 23 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 24 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 25 debian:experimental : Ok gcc (Debian 9.2.1-28) 9.2.1 20200203, clang version 8.0.1-7 (tags/RELEASE_801/final) 26 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 27 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 28 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.2.1-24) 9.2.1 20200117 29 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 30 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 31 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 32 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 33 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 34 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 35 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 36 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 37 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 38 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 39 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 40 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 41 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 42 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 43 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.1 (Fedora 9.0.1-2.fc31) 44 fedora:32 : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.1.rc2.fc32) 45 fedora:rawhide : Ok gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.3.rc2.fc33) 46 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 47 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 48 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 49 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 50 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 51 openmandriva:cooker : Ok gcc (GCC) 10.0.0 20200216 (OpenMandriva), clang version 10.0.0 52 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 53 opensuse:15.1 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 54 opensuse:15.2 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 55 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 56 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20200128 [revision 83f65674e78d97d27537361de1a9d74067ff228d], clang version 9.0.1 57 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 58 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 59 oraclelinux:8 : Ok gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4.5.0.5), clang version 8.0.1 (Red Hat 8.0.1-1.0.1.module+el8.1.0+5428+345cee14) 60 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 61 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 62 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 63 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 66 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 67 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 68 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 69 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 70 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 76 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 77 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 78 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 79 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 80 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 81 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 82 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 83 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 84 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 85 ubuntu:19.10 : FAIL gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # # uname -a Linux five 5.5.5-200.fc31.x86_64 #1 SMP Wed Feb 19 23:28:07 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 f787feff69c4 perf block-info: Support color ops to print block percents in color # perf version --build-options perf version 5.6.rc4.gf787feff69c4 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread maps : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Merge cpu map : Ok 53: Probe SDT events : Ok 54: is_printable_array : Ok 55: Print bitmap : Ok 56: perf hooks : Ok 57: builtin clang support : Skip (not compiled in) 58: unit_number__scnprintf : Ok 59: mem2node : Ok 60: time utils : Ok 61: Test jit_write_elf : Ok 62: maps__merge_in : Ok 63: x86 rdpmc : Ok 64: Convert perf time to TSC : Ok 65: DWARF unwind : Ok 66: x86 instruction decoder - new instructions : Ok 67: Intel PT packet decoder : Ok 68: x86 bp modify : Ok 69: probe libc's inet_pton & backtrace it with ping : Ok 70: Use vfs_getname probe to get syscall args filenames : Ok 71: Check open filename arg using perf trace + vfs_getname: Ok 72: Zstd perf.data compression/decompression : Ok 73: Add vfs_getname probe to get syscall args filenames : Ok $ git log --oneline -1 f787feff69c4 (HEAD -> perf/core, quaco/perf/core) perf block-info: Support color ops to print block percents in color $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_libbionic_O: make NO_LIBBIONIC=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_libnuma_O: make NO_LIBNUMA=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_install_bin_O: make install-bin make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_doc_O: make doc make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libelf_O: make NO_LIBELF=1 make_perf_o_O: make perf.o make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_gtk2_O: make NO_GTK2=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_tags_O: make tags make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_slang_O: make NO_SLANG=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_pure_O: make make_clean_all_O: make clean all make_no_libbpf_O: make NO_LIBBPF=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_libperl_O: make NO_LIBPERL=1 make_util_map_o_O: make util/map.o make_with_babeltrace_O: make LIBBABELTRACE=1 make_help_O: make help make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_newt_O: make NO_NEWT=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_install_O: make install make_no_demangle_O: make NO_DEMANGLE=1 make_debug_O: make DEBUG=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2020-01-16 13:48 Arnaldo Carvalho de Melo 2020-01-20 8:23 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-01-16 13:48 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Andi Kleen, Andres Freund, Cengiz Can, Jann Horn, Jin Yao, Maciej S . Szmigiero, Michael Petlan, Ravi Bangoria, Thomas Richter, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 53f3feeb7bd2d78039b3dc9ab158bad2a5dbe012: Merge tag 'perf-core-for-mingo-5.6-20200106' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2020-01-10 18:49:34 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.6-20200116 for you to fetch changes up to 8af19d66b956401bab1ef24049eec9421be93862: perf header: Use last modification time for timestamp (2020-01-15 10:17:20 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf report: Andi Kleen: - Clarify in help that --children is default. Jin Yao: - Fix no libunwind compiled warning breaking s390. perf annotate/report/top: Andi Kleen: - Support --prefix/--prefix-strip, use it with objdump when doing disassembly. perf c2c: Andres Freund: - Fix return type for histogram sorting comparision functions. perf header: Michael Petlan: - Use last modification time for timestamp, i.e. st.st_mtime instead of the st_ctime. perf beauty: Cengiz Can: - Fix sockaddr printf format for long integers. libperf: Jiri Olsa: - Setup initial evlist::all_cpus value perf parser: Jiri Olsa: - Use %define api.pure full instead of %pure-parser, nuking warning from bison about using deprecated stuff. perf ui gtk: - Add missing zalloc object, fixing gtk browser build. perf clang: Maciej S. Szmigiero: - Fix build issues with Clang 9 and 8+. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Andi Kleen (2): perf report: Clarify in help that --children is default perf tools: Support --prefix/--prefix-strip Andres Freund (1): perf c2c: Fix return type for histogram sorting comparision functions Cengiz Can (1): perf beauty sockaddr: Fix augmented syscall format warning Jin Yao (1): perf report: Fix no libunwind compiled warning break s390 issue Jiri Olsa (4): libperf: Setup initial evlist::all_cpus value perf tools: Use %define api.pure full instead of %pure-parser perf ui gtk: Add missing zalloc object perf/ui/gtk: Fix gtk2 build Maciej S. Szmigiero (2): perf clang: Fix build with Clang 9 tools build: Fix test-clang.cpp with Clang 8+ Michael Petlan (1): perf header: Use last modification time for timestamp tools/build/feature/Makefile | 2 +- tools/build/feature/test-clang.cpp | 6 ++++++ tools/lib/perf/evlist.c | 3 +++ tools/perf/Documentation/perf-annotate.txt | 6 ++++++ tools/perf/Documentation/perf-report.txt | 6 ++++++ tools/perf/Documentation/perf-top.txt | 6 ++++++ tools/perf/builtin-annotate.c | 7 +++++++ tools/perf/builtin-c2c.c | 10 ++++++---- tools/perf/builtin-report.c | 16 ++++++++++++---- tools/perf/builtin-top.c | 7 +++++++ tools/perf/trace/beauty/sockaddr.c | 2 +- tools/perf/ui/gtk/Build | 7 ++++++- tools/perf/util/annotate.c | 19 +++++++++++++++++-- tools/perf/util/annotate.h | 5 +++++ tools/perf/util/c++/clang.cpp | 4 ++++ tools/perf/util/expr.y | 3 ++- tools/perf/util/header.c | 2 +- tools/perf/util/parse-events.y | 2 +- 18 files changed, 97 insertions(+), 16 deletions(-) Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when due to: `.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o OpenMandriva Cooker works well with gcc, uncovers a bug where we have to get compiler-clang.h from the kernel sources, will be fixed soon. With the update of linux/linkage.h to move from ENTRY()/ENDPROC() to SYM_FUNC_START()/etc some of the older containers can't be used with clang, as the minimum version for the constructs used in the new linkage.h is 3.5, older versions (3.4, 3.4.2, etc) end up with: bench/../../arch/x86/lib/memcpy_64.S:44:14: error: unexpected token in '.type' directive .type MEMCPY STT_FUNC ; .size MEMCPY, .-MEMCPY ^ Ubuntu 19.10 is failing when linking against libllvm, which isn't the default, needs to be investigated, haven't tested with CC=gcc, but should be the same problem: + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang ... /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_ignoringImpCasts0Matcher::matches(clang::Expr const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal32matcher_ignoringImpCasts0Matcher7matchesERKNS_4ExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x43): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' /usr/bin/ld: /usr/lib/llvm-9/lib/libclangAnalysis.a(ExprMutationAnalyzer.cpp.o): in function `clang::ast_matchers::internal::matcher_hasLoopVariable0Matcher::matches(clang::CXXForRangeStmt const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const': (.text._ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal31matcher_hasLoopVariable0Matcher7matchesERKNS_15CXXForRangeStmtEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x48): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::ast_type_traits::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const' ... It builds ok with the default set of options. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.5.0-rc3.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:3.11 : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0) 9 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (git://git.alpinelinux.org/aports 25c73ae7b95bdb42ae5f0ceac3b703e766582527) (based on LLVM 9.0.0) 10 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final) 11 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1 12 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.2.1 20190827 (ALT Sisyphus 9.2.1-alt2), clang version 7.0.1 13 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 14 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 15 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 16 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 17 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 18 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 19 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 20 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 21 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20191210 gcc-9-branch@279166, clang version 9.0.0 (tags/RELEASE_900/final) 22 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 23 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 24 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 25 debian:experimental : Ok gcc (Debian 9.2.1-19) 9.2.1 20191109, clang version 8.0.1-4 (tags/RELEASE_801/final) 26 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 27 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 28 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.2.1-8) 9.2.1 20190909 29 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 30 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 31 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 32 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 33 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 34 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 35 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 36 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 37 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 38 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 39 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 40 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 41 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 42 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 43 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 44 fedora:32 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 45 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 46 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 47 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 48 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 49 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 50 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 51 openmandriva:cooker : Ok gcc (GCC) 9.2.1 20191123 (OpenMandriva) 52 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 53 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 54 opensuse:15.2 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 55 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 56 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 9.0.0 (tags/RELEASE_900/final 372316) 57 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 58 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 59 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 60 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 61 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 62 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 63 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 66 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 67 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 68 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 69 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 70 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 76 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 77 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 78 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 79 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 80 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 81 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 82 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 83 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 84 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 85 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # uname -a Linux quaco 5.5.0-rc6+ #2 SMP Tue Jan 14 13:13:43 -03 2020 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 8af19d66b956 perf header: Use last modification time for timestamp # perf version --build-options perf version 5.5.rc3.g8af19d66b956 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread maps : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Merge cpu map : Ok 53: Probe SDT events : Ok 54: is_printable_array : Ok 55: Print bitmap : Ok 56: perf hooks : Ok 57: builtin clang support : Skip (not compiled in) 58: unit_number__scnprintf : Ok 59: mem2node : Ok 60: time utils : Ok 61: Test jit_write_elf : Ok 62: maps__merge_in : Ok 63: x86 rdpmc : Ok 64: Convert perf time to TSC : Ok 65: DWARF unwind : Ok 66: x86 instruction decoder - new instructions : Ok 67: Intel PT packet decoder : Ok 68: x86 bp modify : Ok 69: probe libc's inet_pton & backtrace it with ping : Ok 70: Use vfs_getname probe to get syscall args filenames : Ok 71: Add vfs_getname probe to get syscall args filenames : Ok 72: Check open filename arg using perf trace + vfs_getname: Ok 73: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . - /home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP: make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump make_no_libperl_O: make NO_LIBPERL=1 make_perf_o_O: make perf.o make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_doc_O: make doc make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_gtk2_O: make NO_GTK2=1 make_no_slang_O: make NO_SLANG=1 make_install_bin_O: make install-bin make_no_libpython_O: make NO_LIBPYTHON=1 make_help_O: make help make_install_O: make install make_debug_O: make DEBUG=1 make_cscope_O: make cscope make_util_map_o_O: make util/map.o make_no_newt_O: make NO_NEWT=1 make_clean_all_O: make clean all make_no_libunwind_O: make NO_LIBUNWIND=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_libaudit_O: make NO_LIBAUDIT=1 make_pure_O: make make_tags_O: make tags make_no_libbpf_O: make NO_LIBBPF=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_demangle_O: make NO_DEMANGLE=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libelf_O: make NO_LIBELF=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 - /home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC: make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-01-16 13:48 Arnaldo Carvalho de Melo @ 2020-01-20 8:23 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2020-01-20 8:23 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Andi Kleen, Andres Freund, Cengiz Can, Jann Horn, Jin Yao, Maciej S . Szmigiero, Michael Petlan, Ravi Bangoria, Thomas Richter, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 53f3feeb7bd2d78039b3dc9ab158bad2a5dbe012: > > Merge tag 'perf-core-for-mingo-5.6-20200106' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2020-01-10 18:49:34 +0100) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.6-20200116 > 18 files changed, 97 insertions(+), 16 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2020-01-06 16:06 Arnaldo Carvalho de Melo 2020-01-10 17:50 ` Ingo Molnar 2020-01-28 19:10 ` pr-tracker-bot 0 siblings, 2 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2020-01-06 16:06 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Alexey Budankov, Andi Kleen, Andrey Zhizhikin, David Ahern, Linus Torvalds, Vitaly Chikunov, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit b9fb2de0115bbacab36da31fd10483ea66d9cfab: Merge tag 'perf-urgent-for-mingo-5.5-20191223' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2019-12-23 22:27:44 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.6-20200106 for you to fetch changes up to 6c4798d3f08b81c2c52936b10e0fa872590c96ae: tools lib: Fix builds when glibc contains strlcpy() (2020-01-06 11:46:10 -0300) ---------------------------------------------------------------- perf/core improvements and fixes. perf record: Alexey Budankov: - Adapt affinity for machines with #CPUs > 1K to overcome current 1024 CPUs mask size limitation of cpu_set_t type. perf report/top TUI: Arnaldo Carvalho de Melo: - Make ENTER consistently present the pop up menu with and without call chains, to eliminate confusion. The menu continues available at all times use 'm' and '+' can be used to toggle just one call chain level, 'e' for all the call chains for a top level histogram entry and 'E' to expand all call chains in all top level entries. Extra info about these options was added to the pop up menu entries. Pressing 'k' serves as special hotkey to go straight to the main vmlinux entries, to avoid having to press enter and then select "Zoom into the kernel DSO". perf sched timehist: David Ahern: - Add support for filtering on CPU. perf tests: Arnaldo Carvalho de Melo: - Show expected versus obtained values in bp_signal test. libperf: Jiri Olsa: - Move to tools/lib/perf. - Add man pages. libapi: Andrey Zhizhikin: - Fix gcc9 stringop-truncation compilation error. tools lib: Vitaly Chikunov: - Fix builds when glibc contains strlcpy(), which is the case for ALT Linux. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Alexey Budankov (3): tools bitmap: Implement bitmap_equal() operation at bitmap API perf mmap: Declare type for cpu mask of arbitrary length perf record: Adapt affinity to machines with #CPUs > 1K Andrey Zhizhikin (1): tools lib api fs: Fix gcc9 stringop-truncation compilation error Arnaldo Carvalho de Melo (12): perf tests bp_signal: Show expected versus obtained values perf hists browser: Restore ESC as "Zoom out" of DSO/thread/etc perf report/top: Make ENTER consistently bring up menu perf report/top: Add menu entry for toggling callchain expansion perf report/top: Improve toggle callchain menu option perf hists browser: Generalize the do_zoom_dso() function perf report/top: Add 'k' hotkey to zoom directly into the kernel map perf hists browser: Allow passing an initial hotkey tools ui popup: Allow returning hotkeys perf report/top: Allow pressing hotkeys in the options popup menu perf report/top: Do not offer annotation for symbols without samples perf report/top: Make 'e' visible in the help and make it toggle showing callchains David Ahern (1): perf sched timehist: Add support for filtering on CPU Jiri Olsa (2): libperf: Move to tools/lib/perf libperf: Add man pages Vitaly Chikunov (1): tools lib: Fix builds when glibc contains strlcpy() tools/include/linux/bitmap.h | 30 +++ tools/include/linux/string.h | 8 + tools/lib/api/fs/fs.c | 4 +- tools/lib/bitmap.c | 15 ++ tools/{perf/lib => lib/perf}/Build | 0 tools/lib/perf/Documentation/Makefile | 156 ++++++++++++ tools/lib/perf/Documentation/asciidoc.conf | 120 +++++++++ tools/lib/perf/Documentation/examples/sampling.c | 119 +++++++++ tools/lib/perf/Documentation/libperf-counting.txt | 211 ++++++++++++++++ tools/lib/perf/Documentation/libperf-sampling.txt | 243 ++++++++++++++++++ tools/lib/perf/Documentation/libperf.txt | 246 ++++++++++++++++++ tools/lib/perf/Documentation/manpage-1.72.xsl | 14 ++ tools/lib/perf/Documentation/manpage-base.xsl | 35 +++ .../perf/Documentation/manpage-bold-literal.xsl | 17 ++ tools/lib/perf/Documentation/manpage-normal.xsl | 13 + .../lib/perf/Documentation/manpage-suppress-sp.xsl | 21 ++ tools/{perf/lib => lib/perf}/Makefile | 7 +- tools/{perf/lib => lib/perf}/core.c | 0 tools/{perf/lib => lib/perf}/cpumap.c | 0 tools/{perf/lib => lib/perf}/evlist.c | 0 tools/{perf/lib => lib/perf}/evsel.c | 0 .../lib => lib/perf}/include/internal/cpumap.h | 0 .../lib => lib/perf}/include/internal/evlist.h | 0 .../lib => lib/perf}/include/internal/evsel.h | 0 .../{perf/lib => lib/perf}/include/internal/lib.h | 0 .../{perf/lib => lib/perf}/include/internal/mmap.h | 0 .../lib => lib/perf}/include/internal/tests.h | 0 .../lib => lib/perf}/include/internal/threadmap.h | 0 .../lib => lib/perf}/include/internal/xyarray.h | 0 tools/{perf/lib => lib/perf}/include/perf/core.h | 0 tools/{perf/lib => lib/perf}/include/perf/cpumap.h | 0 tools/{perf/lib => lib/perf}/include/perf/event.h | 0 tools/{perf/lib => lib/perf}/include/perf/evlist.h | 0 tools/{perf/lib => lib/perf}/include/perf/evsel.h | 0 tools/{perf/lib => lib/perf}/include/perf/mmap.h | 0 .../lib => lib/perf}/include/perf/threadmap.h | 0 tools/{perf/lib => lib/perf}/internal.h | 0 tools/{perf/lib => lib/perf}/lib.c | 0 tools/{perf/lib => lib/perf}/libperf.map | 0 tools/{perf/lib => lib/perf}/libperf.pc.template | 0 tools/{perf/lib => lib/perf}/mmap.c | 0 tools/{perf/lib => lib/perf}/tests/Makefile | 2 +- tools/{perf/lib => lib/perf}/tests/test-cpumap.c | 0 tools/{perf/lib => lib/perf}/tests/test-evlist.c | 0 tools/{perf/lib => lib/perf}/tests/test-evsel.c | 0 .../{perf/lib => lib/perf}/tests/test-threadmap.c | 0 tools/{perf/lib => lib/perf}/threadmap.c | 0 tools/{perf/lib => lib/perf}/xyarray.c | 0 tools/lib/string.c | 7 + tools/perf/Documentation/perf-sched.txt | 4 + tools/perf/MANIFEST | 1 + tools/perf/Makefile.config | 2 +- tools/perf/Makefile.perf | 2 +- tools/perf/builtin-c2c.c | 4 +- tools/perf/builtin-record.c | 28 ++- tools/perf/builtin-sched.c | 13 + tools/perf/lib/Documentation/Makefile | 7 - tools/perf/lib/Documentation/man/libperf.rst | 100 -------- tools/perf/lib/Documentation/tutorial/tutorial.rst | 123 --------- tools/perf/tests/bp_signal.c | 10 +- tools/perf/ui/browsers/hists.c | 277 ++++++++++++++------- tools/perf/ui/browsers/hists.h | 2 +- tools/perf/ui/browsers/res_sample.c | 2 +- tools/perf/ui/browsers/scripts.c | 2 +- tools/perf/ui/tui/util.c | 12 +- tools/perf/ui/util.h | 2 +- tools/perf/util/mmap.c | 40 ++- tools/perf/util/mmap.h | 13 +- tools/perf/util/sort.c | 3 +- tools/perf/util/sort.h | 2 + 70 files changed, 1565 insertions(+), 352 deletions(-) rename tools/{perf/lib => lib/perf}/Build (100%) create mode 100644 tools/lib/perf/Documentation/Makefile create mode 100644 tools/lib/perf/Documentation/asciidoc.conf create mode 100644 tools/lib/perf/Documentation/examples/sampling.c create mode 100644 tools/lib/perf/Documentation/libperf-counting.txt create mode 100644 tools/lib/perf/Documentation/libperf-sampling.txt create mode 100644 tools/lib/perf/Documentation/libperf.txt create mode 100644 tools/lib/perf/Documentation/manpage-1.72.xsl create mode 100644 tools/lib/perf/Documentation/manpage-base.xsl create mode 100644 tools/lib/perf/Documentation/manpage-bold-literal.xsl create mode 100644 tools/lib/perf/Documentation/manpage-normal.xsl create mode 100644 tools/lib/perf/Documentation/manpage-suppress-sp.xsl rename tools/{perf/lib => lib/perf}/Makefile (96%) rename tools/{perf/lib => lib/perf}/core.c (100%) rename tools/{perf/lib => lib/perf}/cpumap.c (100%) rename tools/{perf/lib => lib/perf}/evlist.c (100%) rename tools/{perf/lib => lib/perf}/evsel.c (100%) rename tools/{perf/lib => lib/perf}/include/internal/cpumap.h (100%) rename tools/{perf/lib => lib/perf}/include/internal/evlist.h (100%) rename tools/{perf/lib => lib/perf}/include/internal/evsel.h (100%) rename tools/{perf/lib => lib/perf}/include/internal/lib.h (100%) rename tools/{perf/lib => lib/perf}/include/internal/mmap.h (100%) rename tools/{perf/lib => lib/perf}/include/internal/tests.h (100%) rename tools/{perf/lib => lib/perf}/include/internal/threadmap.h (100%) rename tools/{perf/lib => lib/perf}/include/internal/xyarray.h (100%) rename tools/{perf/lib => lib/perf}/include/perf/core.h (100%) rename tools/{perf/lib => lib/perf}/include/perf/cpumap.h (100%) rename tools/{perf/lib => lib/perf}/include/perf/event.h (100%) rename tools/{perf/lib => lib/perf}/include/perf/evlist.h (100%) rename tools/{perf/lib => lib/perf}/include/perf/evsel.h (100%) rename tools/{perf/lib => lib/perf}/include/perf/mmap.h (100%) rename tools/{perf/lib => lib/perf}/include/perf/threadmap.h (100%) rename tools/{perf/lib => lib/perf}/internal.h (100%) rename tools/{perf/lib => lib/perf}/lib.c (100%) rename tools/{perf/lib => lib/perf}/libperf.map (100%) rename tools/{perf/lib => lib/perf}/libperf.pc.template (100%) rename tools/{perf/lib => lib/perf}/mmap.c (100%) rename tools/{perf/lib => lib/perf}/tests/Makefile (93%) rename tools/{perf/lib => lib/perf}/tests/test-cpumap.c (100%) rename tools/{perf/lib => lib/perf}/tests/test-evlist.c (100%) rename tools/{perf/lib => lib/perf}/tests/test-evsel.c (100%) rename tools/{perf/lib => lib/perf}/tests/test-threadmap.c (100%) rename tools/{perf/lib => lib/perf}/threadmap.c (100%) rename tools/{perf/lib => lib/perf}/xyarray.c (100%) delete mode 100644 tools/perf/lib/Documentation/Makefile delete mode 100644 tools/perf/lib/Documentation/man/libperf.rst delete mode 100644 tools/perf/lib/Documentation/tutorial/tutorial.rst Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when due to: `.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o OpenMandriva Cooker works well with gcc, uncovers a bug where we have to get compiler-clang.h from the kernel sources, will be fixed soon. With the update of linux/linkage.h to move from ENTRY()/ENDPROC() to SYM_FUNC_START()/etc some of the older containers can't be used with clang, as the minimum version for the constructs used in the new linkage.h is 3.5, older versions (3.4, 3.4.2, etc) end up with: bench/../../arch/x86/lib/memcpy_64.S:44:14: error: unexpected token in '.type' directive .type MEMCPY STT_FUNC ; .size MEMCPY, .-MEMCPY ^ # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.5.0-rc3.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:3.11 : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0) 9 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (git://git.alpinelinux.org/aports 25c73ae7b95bdb42ae5f0ceac3b703e766582527) (based on LLVM 9.0.0) 10 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final) 11 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1 12 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.2.1 20190827 (ALT Sisyphus 9.2.1-alt2), clang version 7.0.1 13 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 14 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 15 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 16 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 17 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 18 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 19 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 20 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 21 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20191210 gcc-9-branch@279166, clang version 9.0.0 (tags/RELEASE_900/final) 22 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 23 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 24 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 25 debian:experimental : Ok gcc (Debian 9.2.1-19) 9.2.1 20191109, clang version 8.0.1-4 (tags/RELEASE_801/final) 26 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 27 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 28 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.2.1-8) 9.2.1 20190909 29 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 30 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 31 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 32 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 33 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 34 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 35 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 36 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 37 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 38 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 39 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 40 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 41 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 42 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 43 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 44 fedora:32 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 45 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 46 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 47 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 48 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 49 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 50 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 51 openmandriva:cooker : Ok gcc (GCC) 9.2.1 20191123 (OpenMandriva) 52 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 53 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 54 opensuse:15.2 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 55 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 56 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 9.0.0 (tags/RELEASE_900/final 372316) 57 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 58 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 59 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 60 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 61 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 62 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 63 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 66 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 67 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 68 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 69 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 70 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 76 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 77 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 78 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 79 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 80 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 81 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 82 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 83 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 84 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 85 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # # uname -a Linux quaco 5.5.0-rc4+ #2 SMP Thu Jan 2 11:17:21 -03 2020 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 6c4798d3f08b tools lib: Fix builds when glibc contains strlcpy() # perf version --build-options perf version 5.5.rc3.g6c4798d3f08b dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread maps : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Merge cpu map : Ok 53: Probe SDT events : Ok 54: is_printable_array : Ok 55: Print bitmap : Ok 56: perf hooks : Ok 57: builtin clang support : Skip (not compiled in) 58: unit_number__scnprintf : Ok 59: mem2node : Ok 60: time utils : Ok 61: Test jit_write_elf : Ok 62: maps__merge_in : Ok 63: x86 rdpmc : Ok 64: Convert perf time to TSC : Ok 65: DWARF unwind : Ok 66: x86 instruction decoder - new instructions : Ok 67: Intel PT packet decoder : Ok 68: x86 bp modify : Ok 69: probe libc's inet_pton & backtrace it with ping : Ok 70: Use vfs_getname probe to get syscall args filenames : Ok 71: Add vfs_getname probe to get syscall args filenames : Ok 72: Check open filename arg using perf trace + vfs_getname: Ok 73: Zstd perf.data compression/decompression : Ok $ time make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_util_pmu_bison_o_O: make util/pmu-bison.o make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_doc_O: make doc make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libbpf_O: make NO_LIBBPF=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_cscope_O: make cscope make_clean_all_O: make clean all make_install_bin_O: make install-bin make_no_libpython_O: make NO_LIBPYTHON=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_util_map_o_O: make util/map.o make_perf_o_O: make perf.o make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_newt_O: make NO_NEWT=1 make_debug_O: make DEBUG=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_install_O: make install make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_help_O: make help make_no_libnuma_O: make NO_LIBNUMA=1 make_no_slang_O: make NO_SLANG=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_libperl_O: make NO_LIBPERL=1 make_no_gtk2_O: make NO_GTK2=1 make_tags_O: make tags make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_pure_O: make make_no_libelf_O: make NO_LIBELF=1 make_install_prefix_O: make install prefix=/tmp/krava OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-01-06 16:06 Arnaldo Carvalho de Melo @ 2020-01-10 17:50 ` Ingo Molnar 2020-01-28 19:10 ` pr-tracker-bot 1 sibling, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2020-01-10 17:50 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Alexey Budankov, Andi Kleen, Andrey Zhizhikin, David Ahern, Linus Torvalds, Vitaly Chikunov, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit b9fb2de0115bbacab36da31fd10483ea66d9cfab: > > Merge tag 'perf-urgent-for-mingo-5.5-20191223' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2019-12-23 22:27:44 +0100) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.6-20200106 > > for you to fetch changes up to 6c4798d3f08b81c2c52936b10e0fa872590c96ae: > > tools lib: Fix builds when glibc contains strlcpy() (2020-01-06 11:46:10 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes. > > perf record: > > Alexey Budankov: > > - Adapt affinity for machines with #CPUs > 1K to overcome current 1024 CPUs > mask size limitation of cpu_set_t type. > > perf report/top TUI: > > Arnaldo Carvalho de Melo: > > - Make ENTER consistently present the pop up menu with and without call > chains, to eliminate confusion. The menu continues available at all times > use 'm' and '+' can be used to toggle just one call chain level, 'e' for all > the call chains for a top level histogram entry and 'E' to expand all call > chains in all top level entries. Extra info about these options was added to > the pop up menu entries. Pressing 'k' serves as special hotkey to go straight > to the main vmlinux entries, to avoid having to press enter and then select > "Zoom into the kernel DSO". > > perf sched timehist: > > David Ahern: > > - Add support for filtering on CPU. > > perf tests: > > Arnaldo Carvalho de Melo: > > - Show expected versus obtained values in bp_signal test. > > libperf: > > Jiri Olsa: > > - Move to tools/lib/perf. > > - Add man pages. > > libapi: > > Andrey Zhizhikin: > > - Fix gcc9 stringop-truncation compilation error. > > tools lib: > > Vitaly Chikunov: > > - Fix builds when glibc contains strlcpy(), which is the case for ALT Linux. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Alexey Budankov (3): > tools bitmap: Implement bitmap_equal() operation at bitmap API > perf mmap: Declare type for cpu mask of arbitrary length > perf record: Adapt affinity to machines with #CPUs > 1K > > Andrey Zhizhikin (1): > tools lib api fs: Fix gcc9 stringop-truncation compilation error > > Arnaldo Carvalho de Melo (12): > perf tests bp_signal: Show expected versus obtained values > perf hists browser: Restore ESC as "Zoom out" of DSO/thread/etc > perf report/top: Make ENTER consistently bring up menu > perf report/top: Add menu entry for toggling callchain expansion > perf report/top: Improve toggle callchain menu option > perf hists browser: Generalize the do_zoom_dso() function > perf report/top: Add 'k' hotkey to zoom directly into the kernel map > perf hists browser: Allow passing an initial hotkey > tools ui popup: Allow returning hotkeys > perf report/top: Allow pressing hotkeys in the options popup menu > perf report/top: Do not offer annotation for symbols without samples > perf report/top: Make 'e' visible in the help and make it toggle showing callchains > > David Ahern (1): > perf sched timehist: Add support for filtering on CPU > > Jiri Olsa (2): > libperf: Move to tools/lib/perf > libperf: Add man pages > > Vitaly Chikunov (1): > tools lib: Fix builds when glibc contains strlcpy() > 70 files changed, 1565 insertions(+), 352 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2020-01-06 16:06 Arnaldo Carvalho de Melo 2020-01-10 17:50 ` Ingo Molnar @ 2020-01-28 19:10 ` pr-tracker-bot 1 sibling, 0 replies; 133+ messages in thread From: pr-tracker-bot @ 2020-01-28 19:10 UTC (permalink / raw) Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Alexey Budankov, Andi Kleen, Andrey Zhizhikin, David Ahern, Linus Torvalds, Vitaly Chikunov, Arnaldo Carvalho de Melo The pull request you sent on Mon, 6 Jan 2020 13:06:45 -0300: > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.6-20200106 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/53f3feeb7bd2d78039b3dc9ab158bad2a5dbe012 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-12-03 13:55 Arnaldo Carvalho de Melo 2019-12-04 7:51 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-12-03 13:55 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Andi Kleen, Ian Rogers, Sudipm Mukherjee, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit e680a41fcaf07ccac8817c589fc4824988b48eac: Merge tag 'perf-core-for-mingo-5.5-20191128' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2019-11-29 06:56:05 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191203 for you to fetch changes up to 15b3904f8e884e0d34d5f09906cf6526d0b889a2: libtraceevent: Copy pkg-config file to output folder when using O= (2019-12-02 21:58:20 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf report/top: - Fix segfault due to missing initialization of recently introduced struct map_symbol 'maps' field in append_inlines(), when running with DWARF callchains. perf stat: Andi Kleen: - Affinity based optimizations for sessions with many events in machines with large core counts, avoiding excessive number of IPIs. libtraceevent: - Sudip Mukherjee: - Fix installation with O=. - Copy pkg-config file to output folder when using O=. perf bench: Arnaldo Carvalho de Melo: - Update the copies of x86's mem{cpy,set}_64.S, and because that now uses new stuff in linux/linkage.h, update that header too, which made the minimal clang version to build perf to be 3.5, as 3.4 as found in some of the container images used to test build perf can't grok STT_FUNC as a token in .type lines. ABI headers: Arnaldo Carvalho de Melo: - Sync x86's msr-index.h copy with the kernel sources, resulting in new MSRs to be usable in filter expressions in 'perf trace', such as IA32_TSX_CTRL. - Sync linux/fscrypt.h, linux/stat.h, sched.h and the kvm headers. perf trace: Arnaldo Carvalho de Melo: - Add CLEAR_SIGHAND support for clone's flags arg perf kvm: Arnaldo Carvalho de Melo: - Clarify the 'perf kvm' -i and -o command line options perf test: Ian Rogers: - Move test functionality in to a 'perf test' entry. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Andi Kleen (10): perf cpumap: Maintain cpumaps ordered and without dups perf evlist: Maintain evlist->all_cpus perf evsel: Add iterator to iterate over events ordered by CPU perf evsel: Add functions to close evsel on a CPU perf stat: Use affinity for closing file descriptors perf stat: Factor out open error handling perf stat: Use affinity for opening events perf stat: Use affinity for reading perf evsel: Add functions to enable/disable for a specific CPU perf stat: Use affinity for enabling/disabling events Arnaldo Carvalho de Melo (10): perf machine: Fill map_symbol->maps in append_inlines() to fix segfault perf bench: Update the copies of x86's mem{cpy,set}_64.S tools arch x86: Sync the msr-index.h copy with the kernel sources tools headers uapi: Sync linux/fscrypt.h with the kernel sources tools headers uapi: Sync linux/stat.h with the kernel sources tools headers kvm: Sync kvm headers with the kernel sources tools headers UAPI: Sync sched.h with the kernel perf beauty: Add CLEAR_SIGHAND support for clone's flags arg tools arch x86: Sync asm/cpufeatures.h with the kernel sources perf kvm: Clarify the 'perf kvm' -i and -o command line options Ian Rogers (1): perf jit: Move test functionality in to a test Sudip Mukherjee (2): libtraceevent: Fix lib installation with O= libtraceevent: Copy pkg-config file to output folder when using O= tools/arch/arm/include/uapi/asm/kvm.h | 3 +- tools/arch/arm64/include/uapi/asm/kvm.h | 5 +- tools/arch/powerpc/include/uapi/asm/kvm.h | 3 + tools/arch/x86/include/asm/cpufeatures.h | 3 + tools/arch/x86/include/asm/msr-index.h | 18 ++ tools/arch/x86/lib/memcpy_64.S | 20 +-- tools/arch/x86/lib/memset_64.S | 16 +- tools/include/uapi/linux/fscrypt.h | 3 +- tools/include/uapi/linux/kvm.h | 11 ++ tools/include/uapi/linux/sched.h | 60 +++++-- tools/include/uapi/linux/stat.h | 2 +- tools/lib/traceevent/Makefile | 6 +- tools/perf/Documentation/perf-kvm.txt | 5 +- tools/perf/arch/arm/tests/regs_load.S | 4 +- tools/perf/arch/arm64/tests/regs_load.S | 4 +- tools/perf/arch/x86/tests/regs_load.S | 8 +- tools/perf/builtin-record.c | 2 +- tools/perf/builtin-stat.c | 288 +++++++++++++++++++++--------- tools/perf/check-headers.sh | 4 +- tools/perf/lib/cpumap.c | 73 +++++++- tools/perf/lib/evlist.c | 1 + tools/perf/lib/evsel.c | 76 ++++++-- tools/perf/lib/include/internal/evlist.h | 1 + tools/perf/lib/include/perf/cpumap.h | 2 + tools/perf/lib/include/perf/evsel.h | 3 + tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 9 + tools/perf/tests/cpumap.c | 16 ++ tools/perf/tests/event-times.c | 4 +- tools/perf/tests/genelf.c | 51 ++++++ tools/perf/tests/tests.h | 2 + tools/perf/trace/beauty/clone.c | 1 + tools/perf/util/cpumap.h | 1 + tools/perf/util/evlist.c | 113 +++++++++++- tools/perf/util/evlist.h | 11 +- tools/perf/util/evsel.c | 35 +++- tools/perf/util/evsel.h | 9 +- tools/perf/util/genelf.c | 46 ----- tools/perf/util/include/linux/linkage.h | 89 ++++++++- tools/perf/util/machine.c | 1 + tools/perf/util/stat.c | 5 +- tools/perf/util/stat.h | 3 +- 42 files changed, 789 insertions(+), 229 deletions(-) create mode 100644 tools/perf/tests/genelf.c Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. OpenMandriva Cooker works well with gcc, uncovers a bug where we have to get compiler-clang.h from the kernel sources, will be fixed soon. With the update of linux/linkage.h to move from ENTRY()/ENDPROC() to SYM_FUNC_START()/etc some of the older containers can't be used with clang, as the minimum version for the constructs used in the new linkage.h is 3.5, older versions (3.4, 3.4.2, etc) end up with: bench/../../arch/x86/lib/memcpy_64.S:44:14: error: unexpected token in '.type' directive .type MEMCPY STT_FUNC ; .size MEMCPY, .-MEMCPY ^ Finally the build-tests and container tests were performed with the following two fixes (different sha, same contents), that are not in this patch series, will go thru the bpf/net trees. The 'perf test' was performed with what is in this series tho. $ git log --oneline -2 e1bc15a8e7d1 (HEAD -> perf/core) libbpf: Use PRIu64 for sym->st_value to fix build on 32-bit arches 0d0f9df96c5a libbpf: Fix up generation of bpf_helper_defs.h $ [root@quaco ~]# export PERF_TARBALL=http://192.168.124.1/perf/perf-5.4.0.tar.xz [root@quaco ~]# time dm # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.4.0.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (git://git.alpinelinux.org/aports 25c73ae7b95bdb42ae5f0ceac3b703e766582527) (based on LLVM 9.0.0) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 16 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 17 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20191121 gcc-9-branch@278551, clang version 9.0.0 (tags/RELEASE_900/final) 18 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 19 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 20 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 21 debian:experimental : Ok gcc (Debian 9.2.1-19) 9.2.1 20191109, clang version 8.0.1-4 (tags/RELEASE_801/final) 22 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 24 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.2.1-8) 9.2.1 20190909 25 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 26 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 27 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 28 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 29 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 30 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 31 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 32 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 33 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 34 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 35 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 36 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 37 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 39 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 40 fedora:32 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 41 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 42 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 43 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 44 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 45 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 46 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 47 openmandriva:cooker : Ok gcc (GCC) 9.2.1 20191123 (OpenMandriva) 48 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 49 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 50 opensuse:15.2 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 51 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 52 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 9.0.0 (tags/RELEASE_900/final 372316) 53 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 54 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 55 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 56 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 57 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 58 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 59 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 63 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 66 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 76 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 77 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 78 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 79 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 80 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 81 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # # uname -a Linux quaco 5.4.0+ #1 SMP Wed Nov 27 12:05:27 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 15b3904f8e88 libtraceevent: Copy pkg-config file to output folder when using O= # perf version --build-options perf version 5.4.g15b3904f8e88 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread maps : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Merge cpu map : Ok 53: Probe SDT events : Ok 54: is_printable_array : Ok 55: Print bitmap : Ok 56: perf hooks : Ok 57: builtin clang support : Skip (not compiled in) 58: unit_number__scnprintf : Ok 59: mem2node : Ok 60: time utils : Ok 61: Test jit_write_elf : Ok 62: maps__merge_in : Ok 63: x86 rdpmc : Ok 64: Convert perf time to TSC : Ok 65: DWARF unwind : Ok 66: x86 instruction decoder - new instructions : Ok 67: Intel PT packet decoder : Ok 68: x86 bp modify : Ok 69: probe libc's inet_pton & backtrace it with ping : Ok 70: Use vfs_getname probe to get syscall args filenames : Ok 71: Add vfs_getname probe to get syscall args filenames : Ok 72: Check open filename arg using perf trace + vfs_getname: Ok 73: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_libpython_O: make NO_LIBPYTHON=1 make_perf_o_O: make perf.o make_no_libnuma_O: make NO_LIBNUMA=1 make_help_O: make help make_no_backtrace_O: make NO_BACKTRACE=1 make_no_slang_O: make NO_SLANG=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libelf_O: make NO_LIBELF=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_newt_O: make NO_NEWT=1 make_debug_O: make DEBUG=1 make_tags_O: make tags make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_doc_O: make doc make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_libbpf_O: make NO_LIBBPF=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_demangle_O: make NO_DEMANGLE=1 make_install_bin_O: make install-bin make_pure_O: make make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_util_map_o_O: make util/map.o make_clean_all_O: make clean all make_install_prefix_O: make install prefix=/tmp/krava make_cscope_O: make cscope make_no_libperl_O: make NO_LIBPERL=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_gtk2_O: make NO_GTK2=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_install_O: make install OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-12-03 13:55 Arnaldo Carvalho de Melo @ 2019-12-04 7:51 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-12-04 7:51 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Andi Kleen, Ian Rogers, Sudipm Mukherjee, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit e680a41fcaf07ccac8817c589fc4824988b48eac: > > Merge tag 'perf-core-for-mingo-5.5-20191128' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2019-11-29 06:56:05 +0100) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191203 > > for you to fetch changes up to 15b3904f8e884e0d34d5f09906cf6526d0b889a2: > > libtraceevent: Copy pkg-config file to output folder when using O= (2019-12-02 21:58:20 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf report/top: > > - Fix segfault due to missing initialization of recently introduced > struct map_symbol 'maps' field in append_inlines(), when running > with DWARF callchains. > > perf stat: > > Andi Kleen: > > - Affinity based optimizations for sessions with many events in > machines with large core counts, avoiding excessive number of IPIs. > > libtraceevent: > > - Sudip Mukherjee: > > - Fix installation with O=. > > - Copy pkg-config file to output folder when using O=. > > perf bench: > > Arnaldo Carvalho de Melo: > > - Update the copies of x86's mem{cpy,set}_64.S, and because that > now uses new stuff in linux/linkage.h, update that header too, which > made the minimal clang version to build perf to be 3.5, as > 3.4 as found in some of the container images used to test build perf > can't grok STT_FUNC as a token in .type lines. > > ABI headers: > > Arnaldo Carvalho de Melo: > > - Sync x86's msr-index.h copy with the kernel sources, resulting > in new MSRs to be usable in filter expressions in 'perf trace', > such as IA32_TSX_CTRL. > > - Sync linux/fscrypt.h, linux/stat.h, sched.h and the kvm headers. > > perf trace: > > Arnaldo Carvalho de Melo: > > - Add CLEAR_SIGHAND support for clone's flags arg > > perf kvm: > > Arnaldo Carvalho de Melo: > > - Clarify the 'perf kvm' -i and -o command line options > > perf test: > > Ian Rogers: > > - Move test functionality in to a 'perf test' entry. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Andi Kleen (10): > perf cpumap: Maintain cpumaps ordered and without dups > perf evlist: Maintain evlist->all_cpus > perf evsel: Add iterator to iterate over events ordered by CPU > perf evsel: Add functions to close evsel on a CPU > perf stat: Use affinity for closing file descriptors > perf stat: Factor out open error handling > perf stat: Use affinity for opening events > perf stat: Use affinity for reading > perf evsel: Add functions to enable/disable for a specific CPU > perf stat: Use affinity for enabling/disabling events > > Arnaldo Carvalho de Melo (10): > perf machine: Fill map_symbol->maps in append_inlines() to fix segfault > perf bench: Update the copies of x86's mem{cpy,set}_64.S > tools arch x86: Sync the msr-index.h copy with the kernel sources > tools headers uapi: Sync linux/fscrypt.h with the kernel sources > tools headers uapi: Sync linux/stat.h with the kernel sources > tools headers kvm: Sync kvm headers with the kernel sources > tools headers UAPI: Sync sched.h with the kernel > perf beauty: Add CLEAR_SIGHAND support for clone's flags arg > tools arch x86: Sync asm/cpufeatures.h with the kernel sources > perf kvm: Clarify the 'perf kvm' -i and -o command line options > > Ian Rogers (1): > perf jit: Move test functionality in to a test > > Sudip Mukherjee (2): > libtraceevent: Fix lib installation with O= > libtraceevent: Copy pkg-config file to output folder when using O= > > tools/arch/arm/include/uapi/asm/kvm.h | 3 +- > tools/arch/arm64/include/uapi/asm/kvm.h | 5 +- > tools/arch/powerpc/include/uapi/asm/kvm.h | 3 + > tools/arch/x86/include/asm/cpufeatures.h | 3 + > tools/arch/x86/include/asm/msr-index.h | 18 ++ > tools/arch/x86/lib/memcpy_64.S | 20 +-- > tools/arch/x86/lib/memset_64.S | 16 +- > tools/include/uapi/linux/fscrypt.h | 3 +- > tools/include/uapi/linux/kvm.h | 11 ++ > tools/include/uapi/linux/sched.h | 60 +++++-- > tools/include/uapi/linux/stat.h | 2 +- > tools/lib/traceevent/Makefile | 6 +- > tools/perf/Documentation/perf-kvm.txt | 5 +- > tools/perf/arch/arm/tests/regs_load.S | 4 +- > tools/perf/arch/arm64/tests/regs_load.S | 4 +- > tools/perf/arch/x86/tests/regs_load.S | 8 +- > tools/perf/builtin-record.c | 2 +- > tools/perf/builtin-stat.c | 288 +++++++++++++++++++++--------- > tools/perf/check-headers.sh | 4 +- > tools/perf/lib/cpumap.c | 73 +++++++- > tools/perf/lib/evlist.c | 1 + > tools/perf/lib/evsel.c | 76 ++++++-- > tools/perf/lib/include/internal/evlist.h | 1 + > tools/perf/lib/include/perf/cpumap.h | 2 + > tools/perf/lib/include/perf/evsel.h | 3 + > tools/perf/tests/Build | 1 + > tools/perf/tests/builtin-test.c | 9 + > tools/perf/tests/cpumap.c | 16 ++ > tools/perf/tests/event-times.c | 4 +- > tools/perf/tests/genelf.c | 51 ++++++ > tools/perf/tests/tests.h | 2 + > tools/perf/trace/beauty/clone.c | 1 + > tools/perf/util/cpumap.h | 1 + > tools/perf/util/evlist.c | 113 +++++++++++- > tools/perf/util/evlist.h | 11 +- > tools/perf/util/evsel.c | 35 +++- > tools/perf/util/evsel.h | 9 +- > tools/perf/util/genelf.c | 46 ----- > tools/perf/util/include/linux/linkage.h | 89 ++++++++- > tools/perf/util/machine.c | 1 + > tools/perf/util/stat.c | 5 +- > tools/perf/util/stat.h | 3 +- > 42 files changed, 789 insertions(+), 229 deletions(-) > create mode 100644 tools/perf/tests/genelf.c Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-11-28 13:40 Arnaldo Carvalho de Melo 2019-11-29 5:58 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-28 13:40 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexei Starovoitov, Andi Kleen, Andrii Nakryiko, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, this has a merge with mainline to pick bpf stuff, and the build-test and container build tests were performed with two extra patches I cooked to fix libbpf issuers in some odd 32-bit arches and on generation of some bpf helpers headers that will hit mainline via the bpf/net trees. Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 2ea352d5960ad469f5712cf3e293db97beac4e01: Merge remote-tracking branch 'torvalds/master' into perf/core (2019-11-26 11:06:19 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191128 for you to fetch changes up to 5172672da02e483d9b3c4d814c3482d0c8ffb1a6: perf script: Fix invalid LBR/binary mismatch error (2019-11-28 08:08:38 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf script: Adrian Hunter: - Fix brstackinsn for AUXTRACE. - Fix invalid LBR/binary mismatch error. perf diff: Arnaldo Carvalho de Melo: - Use llabs() with 64-bit values, fixing the build in some 32-bit architectures. perf pmu: Andi Kleen: - Use file system cache to optimize sysfs access. x86: Adrian Hunter: - Add some more Intel instructions to the opcode map and to the perf test entry: gf2p8affineinvqb, gf2p8affineqb, gf2p8mulb, v4fmaddps, v4fmaddss, v4fnmaddps, v4fnmaddss, vaesdec, vaesdeclast, vaesenc, vaesenclast, vcvtne2ps2bf16, vcvtneps2bf16, vdpbf16ps, vgf2p8affineinvqb, vgf2p8affineqb, vgf2p8mulb, vp2intersectd, vp2intersectq, vp4dpwssd, vp4dpwssds, vpclmulqdq, vpcompressb, vpcompressw, vpdpbusd, vpdpbusds, vpdpwssd, vpdpwssds, vpexpandb, vpexpandw, vpopcntb, vpopcntd, vpopcntq, vpopcntw, vpshldd, vpshldq, vpshldvd, vpshldvq, vpshldvw, vpshldw, vpshrdd, vpshrdq, vpshrdvd, vpshrdvq, vpshrdvw, vpshrdw, vpshufbitqmb. perf affinity: Andi Kleen: - Add infrastructure to save/restore affinity perf maps: Arnaldo Carvalho de Melo: - Merge 'struct maps' with 'struct map_groups', as there is a 1x1 relationship, simplifying code overal. perf build: Jiri Olsa: - Allow to link with libbpf dynamicaly. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (4): x86/insn: Add some more Intel instructions to the opcode map x86/insn: perf tools: Add some more instructions to the new instructions test perf script: Fix brstackinsn for AUXTRACE perf script: Fix invalid LBR/binary mismatch error Andi Kleen (2): perf pmu: Use file system cache to optimize sysfs access perf affinity: Add infrastructure to save/restore affinity Arnaldo Carvalho de Melo (15): perf script: Move map__fprintf_srccode() to near its only user perf map: Ditch leftover map__reloc_vmlinux() prototype perf map: Remove needless struct forward declarations perf map: Remove unused functions perf maps: Merge 'struct maps' with 'struct map_groups' perf thread: Rename thread->mg to thread->maps perf addr_location: Rename al->mg to al->maps perf map_symbol: Rename ms->mg to ms->maps perf maps: Rename 'mg' variables to 'maps' perf maps: Rename map_groups.h to maps.h perf tests: Rename thread-mg-share to thread-maps-share perf tests: Rename tests/map_groups.c to tests/maps.c perf diff: Use llabs() with 64-bit values perf diff: Use llabs() with 64-bit values perf regs: Make perf_reg_name() return "unknown" instead of NULL Jiri Olsa (1): perf tools: Allow to link with libbpf dynamicaly arch/x86/lib/x86-opcode-map.txt | 44 +- tools/arch/x86/lib/x86-opcode-map.txt | 44 +- tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 + tools/build/feature/test-libbpf.c | 7 + tools/perf/Makefile.config | 10 + tools/perf/Makefile.perf | 6 +- tools/perf/arch/arm/tests/dwarf-unwind.c | 4 +- tools/perf/arch/arm64/tests/dwarf-unwind.c | 4 +- tools/perf/arch/powerpc/tests/dwarf-unwind.c | 4 +- tools/perf/arch/s390/annotate/instructions.c | 2 +- tools/perf/arch/x86/tests/dwarf-unwind.c | 4 +- tools/perf/arch/x86/tests/insn-x86-dat-32.c | 366 ++++++++++++ tools/perf/arch/x86/tests/insn-x86-dat-64.c | 484 +++++++++++++++ tools/perf/arch/x86/tests/insn-x86-dat-src.c | 655 +++++++++++++++++++++ tools/perf/arch/x86/util/event.c | 5 +- tools/perf/builtin-diff.c | 6 +- tools/perf/builtin-report.c | 7 +- tools/perf/builtin-script.c | 46 +- tools/perf/tests/Build | 4 +- tools/perf/tests/builtin-test.c | 8 +- tools/perf/tests/code-reading.c | 2 +- tools/perf/tests/{map_groups.c => maps.c} | 26 +- tools/perf/tests/tests.h | 4 +- .../{thread-mg-share.c => thread-maps-share.c} | 36 +- tools/perf/tests/vmlinux-kallsyms.c | 9 +- tools/perf/ui/browsers/annotate.c | 2 +- tools/perf/ui/stdio/hist.c | 4 +- tools/perf/util/Build | 2 + tools/perf/util/affinity.c | 73 +++ tools/perf/util/affinity.h | 17 + tools/perf/util/annotate.c | 8 +- tools/perf/util/bpf-event.c | 4 +- tools/perf/util/callchain.c | 8 +- tools/perf/util/cs-etm.c | 2 +- tools/perf/util/db-export.c | 12 +- tools/perf/util/event.c | 14 +- tools/perf/util/fncache.c | 63 ++ tools/perf/util/fncache.h | 7 + tools/perf/util/hist.c | 8 +- tools/perf/util/intel-pt.c | 2 +- tools/perf/util/machine.c | 80 ++- tools/perf/util/machine.h | 10 +- tools/perf/util/map.c | 223 ++----- tools/perf/util/map.h | 14 +- tools/perf/util/map_groups.h | 106 ---- tools/perf/util/map_symbol.h | 4 +- tools/perf/util/maps.h | 87 +++ tools/perf/util/perf_regs.h | 2 +- tools/perf/util/pmu.c | 34 +- tools/perf/util/probe-event.c | 4 +- tools/perf/util/python-ext-sources | 1 + .../util/scripting-engines/trace-event-python.c | 2 +- tools/perf/util/srccode.c | 9 +- tools/perf/util/symbol-elf.c | 16 +- tools/perf/util/symbol.c | 91 ++- tools/perf/util/symbol.h | 6 +- tools/perf/util/synthetic-events.c | 2 +- tools/perf/util/thread-stack.c | 4 +- tools/perf/util/thread.c | 38 +- tools/perf/util/thread.h | 4 +- tools/perf/util/unwind-libdw.c | 4 +- tools/perf/util/unwind-libunwind-local.c | 22 +- tools/perf/util/unwind-libunwind.c | 36 +- tools/perf/util/unwind.h | 27 +- tools/perf/util/vdso.c | 2 +- 66 files changed, 2230 insertions(+), 618 deletions(-) create mode 100644 tools/build/feature/test-libbpf.c rename tools/perf/tests/{map_groups.c => maps.c} (83%) rename tools/perf/tests/{thread-mg-share.c => thread-maps-share.c} (64%) create mode 100644 tools/perf/util/affinity.c create mode 100644 tools/perf/util/affinity.h create mode 100644 tools/perf/util/fncache.c create mode 100644 tools/perf/util/fncache.h delete mode 100644 tools/perf/util/map_groups.h create mode 100644 tools/perf/util/maps.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. OpenMandriva Cooker works well with gcc, uncovers a bug where we have to get compiler-clang.h from the kernel sources, will be fixed soon. Finally the build-tests and container tests were performed with the following two fixes, that are not in this patch series, will go thru the bpf/net trees: $ git log --oneline -2 e1bc15a8e7d1 (HEAD -> perf/core) libbpf: Use PRIu64 for sym->st_value to fix build on 32-bit arches 0d0f9df96c5a libbpf: Fix up generation of bpf_helper_defs.h $ The 'perf test' was performed with what is in this series tho. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.4.0.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (git://git.alpinelinux.org/aports 25c73ae7b95bdb42ae5f0ceac3b703e766582527) (based on LLVM 9.0.0) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 17 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20191121 gcc-9-branch@278551, clang version 9.0.0 (tags/RELEASE_900/final) 18 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 19 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 20 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 21 debian:experimental : Ok gcc (Debian 9.2.1-19) 9.2.1 20191109, clang version 8.0.1-4 (tags/RELEASE_801/final) 22 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 24 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.2.1-8) 9.2.1 20190909 25 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 26 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 27 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 28 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 29 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 30 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 31 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 32 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 33 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 34 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 35 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 36 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 37 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 39 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 40 fedora:32 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 41 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 42 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.2.0-r2 p3) 9.2.0 43 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 44 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 45 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 46 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 47 openmandriva:cooker : Ok gcc (GCC) 9.2.1 20191123 (OpenMandriva) 48 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 49 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 50 opensuse:15.2 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 51 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 52 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 9.0.0 (tags/RELEASE_900/final 372316) 53 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 54 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3) 55 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 56 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 57 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 58 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 59 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 63 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 66 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 76 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 77 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 78 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 79 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 80 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 81 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # # uname -a Linux quaco 5.4.0+ #1 SMP Wed Nov 27 12:05:27 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 5172672da02e perf script: Fix invalid LBR/binary mismatch error # perf version --build-options perf version 5.4.g5172672da02e dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread maps : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: maps__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_clean_all_O: make clean all make_pure_O: make make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_libelf_O: make NO_LIBELF=1 make_help_O: make help make_doc_O: make doc make_no_libbionic_O: make NO_LIBBIONIC=1 - /home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC: make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_newt_O: make NO_NEWT=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_util_map_o_O: make util/map.o make_install_bin_O: make install-bin make_debug_O: make DEBUG=1 make_no_libbpf_O: make NO_LIBBPF=1 make_cscope_O: make cscope make_util_pmu_bison_o_O: make util/pmu-bison.o make_install_O: make install make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_slang_O: make NO_SLANG=1 make_perf_o_O: make perf.o make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_gtk2_O: make NO_GTK2=1 make_no_libperl_O: make NO_LIBPERL=1 make_tags_O: make tags OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-11-28 13:40 Arnaldo Carvalho de Melo @ 2019-11-29 5:58 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-11-29 5:58 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexei Starovoitov, Andi Kleen, Andrii Nakryiko, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, this has a merge with mainline to pick > bpf stuff, and the build-test and container build tests were performed > with two extra patches I cooked to fix libbpf issuers in some odd 32-bit > arches and on generation of some bpf helpers headers that will hit > mainline via the bpf/net trees. > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 2ea352d5960ad469f5712cf3e293db97beac4e01: > > Merge remote-tracking branch 'torvalds/master' into perf/core (2019-11-26 11:06:19 -0300) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191128 > > for you to fetch changes up to 5172672da02e483d9b3c4d814c3482d0c8ffb1a6: > > perf script: Fix invalid LBR/binary mismatch error (2019-11-28 08:08:38 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf script: > > Adrian Hunter: > > - Fix brstackinsn for AUXTRACE. > > - Fix invalid LBR/binary mismatch error. > > perf diff: > > Arnaldo Carvalho de Melo: > > - Use llabs() with 64-bit values, fixing the build in some 32-bit > architectures. > > perf pmu: > > Andi Kleen: > > - Use file system cache to optimize sysfs access. > > x86: > > Adrian Hunter: > > - Add some more Intel instructions to the opcode map and to the perf > test entry: > > gf2p8affineinvqb, gf2p8affineqb, gf2p8mulb, v4fmaddps, > v4fmaddss, v4fnmaddps, v4fnmaddss, vaesdec, vaesdeclast, vaesenc, > vaesenclast, vcvtne2ps2bf16, vcvtneps2bf16, vdpbf16ps, > vgf2p8affineinvqb, vgf2p8affineqb, vgf2p8mulb, vp2intersectd, > vp2intersectq, vp4dpwssd, vp4dpwssds, vpclmulqdq, vpcompressb, > vpcompressw, vpdpbusd, vpdpbusds, vpdpwssd, vpdpwssds, vpexpandb, > vpexpandw, vpopcntb, vpopcntd, vpopcntq, vpopcntw, vpshldd, vpshldq, > vpshldvd, vpshldvq, vpshldvw, vpshldw, vpshrdd, vpshrdq, vpshrdvd, > vpshrdvq, vpshrdvw, vpshrdw, vpshufbitqmb. > > perf affinity: > > Andi Kleen: > > - Add infrastructure to save/restore affinity > > perf maps: > > Arnaldo Carvalho de Melo: > > - Merge 'struct maps' with 'struct map_groups', as there is a > 1x1 relationship, simplifying code overal. > > perf build: > > Jiri Olsa: > > - Allow to link with libbpf dynamicaly. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > 66 files changed, 2230 insertions(+), 618 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-11-22 14:56 Arnaldo Carvalho de Melo 2019-11-23 8:07 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-22 14:56 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexey Budankov, Colin King, Hewenliang, Ian Rogers, Jin Yao, Steven Rostedt, Sudipm Mukherjee, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 8f6ee51d772d0dab407d868449d2c5d9c8d2b6fc: Merge tag 'perf-core-for-mingo-5.5-20191119' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-11-19 12:59:03 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191122 for you to fetch changes up to 4584f084aa9d8033d5911935837dbee7b082d0e9: perf parse: Fix potential memory leak when handling tracepoint errors (2019-11-22 10:48:14 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf report: Jin Yao: - Allow entering the annotation view (symbol source/assembly + overhead/cycles/etc column) from the 'perf report --total-cycles' interface. E.g.: # perf record --all-cpus --branch-any --all-kernel ^C[ perf record: Woken up 5 times to write data ] # # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_user: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # # perf report --total-cycles # # Samples: 78762 of event 'cycles' Sampled Sampled Avg Avg Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object 1.72% 95.8K 0.00% 254 [msr.h:105 -> msr.h:166] [kernel.vmlinux] 1.56% 107.6K 0.00% 618 [compiler.h:199 -> common.c:301] [kernel.vmlinux] 0.83% 46.3K 0.00% 409 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux] 0.83% 46.1K 0.00% 83 [jump_label.h:41 -> tsc.c:230] [kernel.vmlinux] 0.64% 36.9K 0.01% 1.4K [hda_intel.c:904 -> hda_intel.c:916] [snd_hda_intel] 0.57% 30.2K 0.00% 282 [file.c:710 -> file.c:730] [kernel.vmlinux] 0.48% 25.8K 0.00% 82 [spinlock.c:158 -> spinlock.c:160] [kernel.vmlinux] 0.45% 23.7K 0.00% 369 [tick-broadcast.c:585 -> tick-broadcast.c:586] [kernel.vmlinux] 0.44% 24.4K 0.00% 73 [msr.h:236 -> tsc.c:1088] [kernel.vmlinux] 0.43% 22.7K 0.00% 144 [cpuidle.c:229 -> cpuidle.c:232] [kernel.vmlinux] Then press 'A' or Enter on one of those lines, just like with 'perf top', say the top one: [msr.h:105 -> msr.h:166], then this shows up: Samples: 78K of event 'cycles', 4000 Hz, Event count (approx.): 78762 native_write_msr /lib/modules/5.4.0-rc8/build/vmlinux [Percent: local period] Percent│ IPC Cycle (Average IPC: 0.02, IPC Coverage: 50.0%) │ │ Disassembly of section .text: │ │ ffffffff8106c480 <native_write_msr>: │ __wrmsr(): │ return EAX_EDX_VAL(val, low, high); │ } │ │ static inline void notrace __wrmsr(unsigned int msr, u32 low, u32 high) │ { │ asm volatile("1: wrmsr\n" 49.16 │0.02 mov %edi,%ecx │0.02 mov %esi,%eax │0.02 wrmsr │ arch_static_branch(): │ #include <linux/stringify.h> │ #include <linux/types.h> │ │ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) │ { │ asm_volatile_goto("1:" 0.79 │0.02 nop │ native_write_msr(): │ { │ __wrmsr(msr, low, high); │ │ if (msr_tracepoint_active(__tracepoint_write_msr)) │ do_trace_write_msr(msr, ((u64)high << 32 | low), 0); │ } 50.05 │0.02 254 ← retq │ do_trace_write_msr(msr, ((u64)high << 32 | low), 0); │ shl $0x20,%rdx │ mov %esi,%esi │ or %rdx,%rsi │ xor %edx,%edx │ → jmpq do_trace_write_msr We need to improve this to show the source code line numbers in the annotation view, so one can go from that program block to the annotation view and see those source code line numbers straight away. auxtrace/Intel PT: Adrian Hunter: - Add support for AUX area sampling, requires new functionality that will land in 5.5, its already in tip. This includes kernel capability querying so that it fails gracefully with older kernels, duimping aux area samples in 'perf report -D' and 'perf script'. perf.data: Alexey Budankov: - Fix decompression of PERF_RECORD_COMPRESSED records. core: Arnaldo Carvalho de Melo: - Use the 'dcacheline' cmp routine to find the right DSOs taking into account the 'maj', 'min', 'ino' and 'ino_generation', that got moved from 'struct map' to 'struct dso', where it belongs. This further reduces the size of 'struct map', there is still more work to do to maybe get it to max one cacheline. libtraceevent: Hewenliang: - Fix memory leakage in copy_filter_type(). Sudip Mukherjee: - Fix header installation. perf parse: Ian Rogers : - Fix potential memory leak when handling tracepoint errors, found using LLVM's libFuzzer. perf probe: Colin Ian King: - Fix spelling mistake "addrees" -> "address". Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (14): perf tools: Add kernel AUX area sampling definitions perf record: Add a function to test for kernel support for AUX area sampling perf auxtrace: Move perf_evsel__find_pmu() perf auxtrace: Add support for AUX area sample recording perf record: Add support for AUX area sampling perf record: Add aux-sample-size config term perf inject: Cut AUX area samples perf auxtrace: Add support for dumping AUX area samples perf session: Add facility to peek at all events perf auxtrace: Add support for queuing AUX area samples perf pmu: When using default config, record which bits of config were changed by the user perf intel-pt: Add support for recording AUX area samples perf intel-pt: Add support for decoding AUX area samples perf intel-bts: Does not support AUX area sampling Alexey Budankov (1): perf session: Fix decompression of PERF_RECORD_COMPRESSED records Arnaldo Carvalho de Melo (5): perf map: Move maj/min/ino/ino_generation to separate struct perf map: Pass a dso_id to map__new() perf map: Move comparision of map's dso_id to a separate function perf dsos: Remove unused dsos__find() method perf dso: Move dso_id from 'struct map' to 'struct dso' Colin Ian King (1): perf probe: Fix spelling mistake "addrees" -> "address" Hewenliang (1): libtraceevent: Fix memory leakage in copy_filter_type Ian Rogers (1): perf parse: Fix potential memory leak when handling tracepoint errors Jin Yao (2): perf util: Move block TUI function to ui browsers perf report: Jump to symbol source view from total cycles view Sudip Mukherjee (1): libtraceevent: Fix header installation tools/include/uapi/linux/perf_event.h | 10 +- tools/lib/traceevent/Makefile | 8 +- tools/lib/traceevent/parse-filter.c | 9 +- tools/perf/Documentation/intel-pt.txt | 59 +++++- tools/perf/Documentation/perf-record.txt | 9 + tools/perf/arch/x86/util/auxtrace.c | 4 + tools/perf/arch/x86/util/intel-bts.c | 5 + tools/perf/arch/x86/util/intel-pt.c | 81 +++++++- tools/perf/builtin-inject.c | 29 +++ tools/perf/builtin-record.c | 21 +- tools/perf/builtin-report.c | 11 +- tools/perf/tests/attr/base-record | 2 +- tools/perf/tests/attr/base-stat | 2 +- tools/perf/tests/sample-parsing.c | 16 +- tools/perf/ui/browsers/hists.c | 78 +++++++- tools/perf/util/auxtrace.c | 322 ++++++++++++++++++++++++++++-- tools/perf/util/auxtrace.h | 43 ++++ tools/perf/util/block-info.c | 71 +------ tools/perf/util/block-info.h | 3 +- tools/perf/util/dso.c | 24 ++- tools/perf/util/dso.h | 13 ++ tools/perf/util/dsos.c | 97 +++++++-- tools/perf/util/dsos.h | 14 +- tools/perf/util/event.h | 6 + tools/perf/util/evlist.h | 1 + tools/perf/util/evsel.c | 31 +++ tools/perf/util/evsel_config.h | 13 ++ tools/perf/util/hist.h | 15 ++ tools/perf/util/intel-pt.c | 109 +++++++++- tools/perf/util/machine.c | 22 +- tools/perf/util/machine.h | 2 + tools/perf/util/map.c | 11 +- tools/perf/util/map.h | 9 +- tools/perf/util/parse-events.c | 65 +++++- tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.l | 1 + tools/perf/util/perf_event_attr_fprintf.c | 3 +- tools/perf/util/pmu.c | 10 + tools/perf/util/pmu.h | 2 + tools/perf/util/probe-finder.c | 2 +- tools/perf/util/record.c | 31 +++ tools/perf/util/record.h | 2 + tools/perf/util/session.c | 82 ++++++-- tools/perf/util/session.h | 5 + tools/perf/util/sort.c | 24 +-- tools/perf/util/synthetic-events.c | 12 ++ 46 files changed, 1190 insertions(+), 200 deletions(-) Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. Manjaro got fixed by adding the 'gettext' package, that provides a library needed by bison but not present in its dependencies list, i.e. a distro bug. cooker is failing with: In file included from cpumap.c:4: In file included from /git/linux/tools/include/linux/refcount.h:41: In file included from /git/linux/tools/include/linux/atomic.h:5: In file included from /git/linux/tools/include/asm/atomic.h:6: In file included from /git/linux/tools/include/asm/../../arch/x86/include/asm/atomic.h:11: /git/linux/tools/arch/x86/include/asm/cmpxchg.h:12:2: error: unknown attribute 'error' ignored [-Werror,-Wunknown-attributes] __compiletime_error("Bad argument size for cmpxchg"); ^ /git/linux/tools/include/linux/compiler-gcc.h:20:54: note: expanded from macro '__compiletime_error' # define __compiletime_error(message) __attribute__((error(message))) ^ LD /tmp/build/perf/fs/libapi-in.o Still needs investigating, new image, just leaving it here for documentation purposes, maybe related to it using the most recent gcc and clang versions? # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.4.0-rc7.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (git://git.alpinelinux.org/aports 25c73ae7b95bdb42ae5f0ceac3b703e766582527) (based on LLVM 9.0.0) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 17 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20191101 gcc-9-branch@277702, clang version 9.0.0 (tags/RELEASE_900/final) 18 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 19 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 20 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 21 debian:experimental : Ok gcc (Debian 9.2.1-9) 9.2.1 20191008, clang version 8.0.1-3+b1 (tags/RELEASE_801/final) 22 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 24 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-19) 8.3.0 25 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 26 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 27 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 28 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 29 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 30 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 31 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 32 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 33 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 34 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 35 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 36 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 37 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 39 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 40 fedora:32 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 41 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 42 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 43 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 44 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 45 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 46 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 47 openmandriva:cooker : FAIL gcc (GCC) 9.2.1 20191109 (OpenMandriva), clang version 9.0.1 48 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 49 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 50 opensuse:15.2 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 51 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 52 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 9.0.0 (tags/RELEASE_900/final 372316) 53 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 54 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 55 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 56 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 57 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 58 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 59 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 63 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 66 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 76 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 77 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 78 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 79 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 80 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 81 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # # uname -a Linux quaco 5.4.0-rc8 #1 SMP Mon Nov 18 06:15:31 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 4584f084aa9d perf parse: Fix potential memory leak when handling tracepoint errors # perf version --build-options perf version 5.4.rc7.g4584f084aa9d dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_slang_O: make NO_SLANG=1 make_no_gtk2_O: make NO_GTK2=1 make_perf_o_O: make perf.o make_install_O: make install make_no_libnuma_O: make NO_LIBNUMA=1 make_no_libbpf_O: make NO_LIBBPF=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_libelf_O: make NO_LIBELF=1 make_pure_O: make make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_help_O: make help make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_cscope_O: make cscope make_no_newt_O: make NO_NEWT=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_debug_O: make DEBUG=1 make_clean_all_O: make clean all make_no_libperl_O: make NO_LIBPERL=1 make_install_bin_O: make install-bin make_util_pmu_bison_o_O: make util/pmu-bison.o make_doc_O: make doc make_install_prefix_O: make install prefix=/tmp/krava make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_auxtrace_O: make NO_AUXTRACE=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_util_map_o_O: make util/map.o make_tags_O: make tags OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-11-22 14:56 Arnaldo Carvalho de Melo @ 2019-11-23 8:07 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-11-23 8:07 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexey Budankov, Colin King, Hewenliang, Ian Rogers, Jin Yao, Steven Rostedt, Sudipm Mukherjee, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 8f6ee51d772d0dab407d868449d2c5d9c8d2b6fc: > > Merge tag 'perf-core-for-mingo-5.5-20191119' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-11-19 12:59:03 +0100) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191122 > > for you to fetch changes up to 4584f084aa9d8033d5911935837dbee7b082d0e9: > > perf parse: Fix potential memory leak when handling tracepoint errors (2019-11-22 10:48:14 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf report: > > Jin Yao: > > - Allow entering the annotation view (symbol source/assembly + > overhead/cycles/etc column) from the 'perf report --total-cycles' > interface. > > E.g.: > > # perf record --all-cpus --branch-any --all-kernel > ^C[ perf record: Woken up 5 times to write data ] > # > # perf evlist -v > cycles: size: 120, { sample_period, sample_freq }: 4000, > sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, > read_format: ID, disabled: 1, inherit: 1, exclude_user: 1, mmap: 1, comm: 1, freq: 1, task: 1, > precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, > bpf_event: 1, branch_sample_type: ANY > # > # perf report --total-cycles > # > # Samples: 78762 of event 'cycles' > Sampled Sampled Avg Avg > Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object > 1.72% 95.8K 0.00% 254 [msr.h:105 -> msr.h:166] [kernel.vmlinux] > 1.56% 107.6K 0.00% 618 [compiler.h:199 -> common.c:301] [kernel.vmlinux] > 0.83% 46.3K 0.00% 409 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux] > 0.83% 46.1K 0.00% 83 [jump_label.h:41 -> tsc.c:230] [kernel.vmlinux] > 0.64% 36.9K 0.01% 1.4K [hda_intel.c:904 -> hda_intel.c:916] [snd_hda_intel] > 0.57% 30.2K 0.00% 282 [file.c:710 -> file.c:730] [kernel.vmlinux] > 0.48% 25.8K 0.00% 82 [spinlock.c:158 -> spinlock.c:160] [kernel.vmlinux] > 0.45% 23.7K 0.00% 369 [tick-broadcast.c:585 -> tick-broadcast.c:586] [kernel.vmlinux] > 0.44% 24.4K 0.00% 73 [msr.h:236 -> tsc.c:1088] [kernel.vmlinux] > 0.43% 22.7K 0.00% 144 [cpuidle.c:229 -> cpuidle.c:232] [kernel.vmlinux] > > Then press 'A' or Enter on one of those lines, just like with 'perf top', say > the top one: [msr.h:105 -> msr.h:166], then this shows up: > > Samples: 78K of event 'cycles', 4000 Hz, Event count (approx.): 78762 > native_write_msr /lib/modules/5.4.0-rc8/build/vmlinux [Percent: local period] > Percent│ IPC Cycle (Average IPC: 0.02, IPC Coverage: 50.0%) > │ > │ Disassembly of section .text: > │ > │ ffffffff8106c480 <native_write_msr>: > │ __wrmsr(): > │ return EAX_EDX_VAL(val, low, high); > │ } > │ > │ static inline void notrace __wrmsr(unsigned int msr, u32 low, u32 high) > │ { > │ asm volatile("1: wrmsr\n" > 49.16 │0.02 mov %edi,%ecx > │0.02 mov %esi,%eax > │0.02 wrmsr > │ arch_static_branch(): > │ #include <linux/stringify.h> > │ #include <linux/types.h> > │ > │ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) > │ { > │ asm_volatile_goto("1:" > 0.79 │0.02 nop > │ native_write_msr(): > │ { > │ __wrmsr(msr, low, high); > │ > │ if (msr_tracepoint_active(__tracepoint_write_msr)) > │ do_trace_write_msr(msr, ((u64)high << 32 | low), 0); > │ } > 50.05 │0.02 254 ← retq > │ do_trace_write_msr(msr, ((u64)high << 32 | low), 0); > │ shl $0x20,%rdx > │ mov %esi,%esi > │ or %rdx,%rsi > │ xor %edx,%edx > │ → jmpq do_trace_write_msr > > We need to improve this to show the source code line numbers in the > annotation view, so one can go from that program block to the annotation view > and see those source code line numbers straight away. > > auxtrace/Intel PT: > > Adrian Hunter: > > - Add support for AUX area sampling, requires new functionality that > will land in 5.5, its already in tip. > > This includes kernel capability querying so that it fails gracefully > with older kernels, duimping aux area samples in 'perf report -D' and > 'perf script'. > > perf.data: > > Alexey Budankov: > > - Fix decompression of PERF_RECORD_COMPRESSED records. > > core: > > Arnaldo Carvalho de Melo: > > - Use the 'dcacheline' cmp routine to find the right DSOs taking into > account the 'maj', 'min', 'ino' and 'ino_generation', that got moved > from 'struct map' to 'struct dso', where it belongs. > > This further reduces the size of 'struct map', there is still more > work to do to maybe get it to max one cacheline. > > libtraceevent: > > Hewenliang: > > - Fix memory leakage in copy_filter_type(). > > Sudip Mukherjee: > > - Fix header installation. > > perf parse: > > Ian Rogers : > > - Fix potential memory leak when handling tracepoint errors, found using > LLVM's libFuzzer. > > perf probe: > > Colin Ian King: > > - Fix spelling mistake "addrees" -> "address". > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > 46 files changed, 1190 insertions(+), 200 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-11-19 11:32 Arnaldo Carvalho de Melo 2019-11-19 12:00 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-19 11:32 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Ian Rogers, James Clark, Konstantin Khlebnikov, Masami Hiramatsu, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo The following changes since commit e1e9b78d3957a267346a86c8f2c433f6a332af65: perf parse: Use YYABORT to clear stack after failure, plugging leaks (2019-11-12 08:34:16 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191119 for you to fetch changes up to a910e4666d61712840c78de33cc7f89de8affa78: perf parse: Report initial event parsing error (2019-11-18 19:14:29 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: x86/insn: Adrian Hunter: - Add some more Intel instructions to the opcode map: cldemote, encls, enclu, enclv, enqcmd, enqcmds, movdir64b, movdiri, pconfig, tpause, umonitor, umwait, wbnoinvd. - The instruction decoding can be tested using the perf tools' "x86 instruction decoder - new instructions" test as folllows: $ perf test -v "new " 2>&1 | grep -i cldemote Decoded ok: 0f 1c 00 cldemote (%eax) Decoded ok: 0f 1c 05 78 56 34 12 cldemote 0x12345678 Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%eax,%ecx,8) Decoded ok: 0f 1c 00 cldemote (%rax) Decoded ok: 41 0f 1c 00 cldemote (%r8) Decoded ok: 0f 1c 04 25 78 56 34 12 cldemote 0x12345678 Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%rax,%rcx,8) Decoded ok: 41 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%r8,%rcx,8) $ perf test -v "new " 2>&1 | grep -i tpause Decoded ok: 66 0f ae f3 tpause %ebx Decoded ok: 66 0f ae f3 tpause %ebx Decoded ok: 66 41 0f ae f0 tpause %r8d callchains: Adrian Hunter: - Fix segfault in thread__resolve_callchain_sample(). perf probe: - Line fixes to show only lines where probes can be used with 'perf probe -L', and when reporting them via 'perf probe -l'. - Support multiprobe events. perf scripts python: Adrian Hunter: - Fix use of TRUE with SQLite < 3.23 in exported-sql-viewer.py. perf maps: - Trim 'struct map' by removing the rb_node member for sorting by map name, as that is only needed for processing kernel maps, and only when classifying symbols by section at load time. Sort them by name using qsort() and do lookups using bsearch() when map_groups__find_by_name() is used. perf parse: Ian Rogers: - Report initial event parsing error, providing a less cryptic message to state that a PMU wasn't found in the system. perf vendor events: James Clark: - Fix commas so that PMU event files for arm64, power8 and power nine become valid JSON. libtraceevent: Konstantin Khlebnikov: - Fix parsing of event %o and %X argument types. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (4): perf scripts python: exported-sql-viewer.py: Fix use of TRUE with SQLite perf callchain: Fix segfault in thread__resolve_callchain_sample() x86/insn: perf tools: Add some instructions to the new instructions test x86/insn: Add some Intel instructions to the opcode map Arnaldo Carvalho de Melo (9): perf maps: Purge the entries from maps->names in __maps__purge() perf maps: Do not use an rbtree to sort by map name perf map_groups: Add a front end cache for map lookups by name perf map: No need to adjust the long name of modules perf record: No need to process the synthesized MMAP events twice perf machine: No need to check if kernel module maps pre-exist perf map_groups: Auto sort maps by name, if needed perf map: Use bitmap for booleans perf map: Move seldom used ->flags field to second cacheline Ian Rogers (1): perf parse: Report initial event parsing error James Clark (3): perf vendor events arm64: Fix commas so PMU event files are valid JSON perf vendor events power8: Fix commas so PMU event files are valid JSON perf vendor events power9: Fix commas so PMU event files are valid JSON Konstantin Khlebnikov (1): libtraceevent: Fix parsing of event %o and %X argument types Masami Hiramatsu (7): perf probe: Show correct statement line number by perf probe -l perf probe: Verify given line is a representive line perf probe: Do not show non representive lines by perf-probe -L perf probe: Generate event name with line number perf probe: Support multiprobe event perf probe: Support DW_AT_const_value constant value perf probe: Trace a magic number if variable is not found arch/x86/lib/x86-opcode-map.txt | 18 +- tools/arch/x86/lib/x86-opcode-map.txt | 18 +- tools/lib/traceevent/event-parse.c | 7 +- tools/perf/arch/powerpc/util/kvm-stat.c | 4 +- tools/perf/arch/x86/tests/insn-x86-dat-32.c | 52 + tools/perf/arch/x86/tests/insn-x86-dat-64.c | 62 ++ tools/perf/arch/x86/tests/insn-x86-dat-src.c | 109 ++ tools/perf/builtin-record.c | 29 +- tools/perf/builtin-stat.c | 2 + tools/perf/builtin-trace.c | 16 +- .../pmu-events/arch/arm64/ampere/emag/branch.json | 8 +- .../pmu-events/arch/arm64/ampere/emag/bus.json | 14 +- .../pmu-events/arch/arm64/ampere/emag/cache.json | 28 +- .../pmu-events/arch/arm64/ampere/emag/clock.json | 2 +- .../arch/arm64/ampere/emag/exception.json | 26 +- .../arch/arm64/ampere/emag/instruction.json | 28 +- .../arch/arm64/ampere/emag/intrinsic.json | 10 +- .../pmu-events/arch/arm64/ampere/emag/memory.json | 12 +- .../arch/arm64/ampere/emag/pipeline.json | 2 +- .../arch/arm64/arm/cortex-a53/branch.json | 2 +- .../pmu-events/arch/arm64/arm/cortex-a53/bus.json | 4 +- .../arch/arm64/arm/cortex-a53/other.json | 4 +- .../arm64/arm/cortex-a57-a72/core-imp-def.json | 120 +- .../pmu-events/arch/arm64/armv8-recommended.json | 158 +-- .../arch/arm64/cavium/thunderx2/core-imp-def.json | 74 +- .../arch/arm64/hisilicon/hip08/core-imp-def.json | 60 +- .../arch/arm64/hisilicon/hip08/uncore-ddrc.json | 18 +- .../arch/arm64/hisilicon/hip08/uncore-hha.json | 22 +- .../arch/arm64/hisilicon/hip08/uncore-l3c.json | 28 +- .../perf/pmu-events/arch/powerpc/power8/cache.json | 60 +- .../arch/powerpc/power8/floating-point.json | 6 +- .../pmu-events/arch/powerpc/power8/frontend.json | 158 +-- .../pmu-events/arch/powerpc/power8/marked.json | 266 ++--- .../pmu-events/arch/powerpc/power8/memory.json | 72 +- .../perf/pmu-events/arch/powerpc/power8/other.json | 1150 ++++++++++---------- .../pmu-events/arch/powerpc/power8/pipeline.json | 118 +- tools/perf/pmu-events/arch/powerpc/power8/pmc.json | 48 +- .../arch/powerpc/power8/translation.json | 60 +- .../perf/pmu-events/arch/powerpc/power9/cache.json | 44 +- .../arch/powerpc/power9/floating-point.json | 14 +- .../pmu-events/arch/powerpc/power9/frontend.json | 142 +-- .../pmu-events/arch/powerpc/power9/marked.json | 250 ++--- .../pmu-events/arch/powerpc/power9/memory.json | 52 +- .../perf/pmu-events/arch/powerpc/power9/other.json | 934 ++++++++-------- .../pmu-events/arch/powerpc/power9/pipeline.json | 212 ++-- tools/perf/pmu-events/arch/powerpc/power9/pmc.json | 48 +- .../arch/powerpc/power9/translation.json | 92 +- tools/perf/scripts/python/exported-sql-viewer.py | 12 +- tools/perf/tests/map_groups.c | 2 +- tools/perf/tests/parse-events.c | 3 +- tools/perf/util/dwarf-aux.c | 62 +- tools/perf/util/machine.c | 43 +- tools/perf/util/machine.h | 2 - tools/perf/util/map.c | 116 +- tools/perf/util/map.h | 7 +- tools/perf/util/map_groups.h | 21 +- tools/perf/util/metricgroup.c | 2 +- tools/perf/util/parse-events.c | 78 +- tools/perf/util/parse-events.h | 4 + tools/perf/util/probe-event.c | 19 +- tools/perf/util/probe-event.h | 3 + tools/perf/util/probe-file.c | 14 + tools/perf/util/probe-file.h | 2 + tools/perf/util/probe-finder.c | 116 +- tools/perf/util/probe-finder.h | 1 + tools/perf/util/symbol.c | 84 +- 66 files changed, 2888 insertions(+), 2366 deletions(-) ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-11-19 11:32 Arnaldo Carvalho de Melo @ 2019-11-19 12:00 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-11-19 12:00 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Ian Rogers, James Clark, Konstantin Khlebnikov, Masami Hiramatsu, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > > The following changes since commit e1e9b78d3957a267346a86c8f2c433f6a332af65: > > perf parse: Use YYABORT to clear stack after failure, plugging leaks (2019-11-12 08:34:16 -0300) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191119 > > for you to fetch changes up to a910e4666d61712840c78de33cc7f89de8affa78: > > perf parse: Report initial event parsing error (2019-11-18 19:14:29 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > x86/insn: > > Adrian Hunter: > > - Add some more Intel instructions to the opcode map: > > cldemote, encls, enclu, enclv, enqcmd, enqcmds, movdir64b, > movdiri, pconfig, tpause, umonitor, umwait, wbnoinvd. > > - The instruction decoding can be tested using the perf tools' > "x86 instruction decoder - new instructions" test as folllows: > > $ perf test -v "new " 2>&1 | grep -i cldemote > Decoded ok: 0f 1c 00 cldemote (%eax) > Decoded ok: 0f 1c 05 78 56 34 12 cldemote 0x12345678 > Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%eax,%ecx,8) > Decoded ok: 0f 1c 00 cldemote (%rax) > Decoded ok: 41 0f 1c 00 cldemote (%r8) > Decoded ok: 0f 1c 04 25 78 56 34 12 cldemote 0x12345678 > Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%rax,%rcx,8) > Decoded ok: 41 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%r8,%rcx,8) > $ perf test -v "new " 2>&1 | grep -i tpause > Decoded ok: 66 0f ae f3 tpause %ebx > Decoded ok: 66 0f ae f3 tpause %ebx > Decoded ok: 66 41 0f ae f0 tpause %r8d > > callchains: > > Adrian Hunter: > > - Fix segfault in thread__resolve_callchain_sample(). > > perf probe: > > - Line fixes to show only lines where probes can be used with 'perf probe -L', > and when reporting them via 'perf probe -l'. > > - Support multiprobe events. > > perf scripts python: > > Adrian Hunter: > > - Fix use of TRUE with SQLite < 3.23 in exported-sql-viewer.py. > > perf maps: > > - Trim 'struct map' by removing the rb_node member for sorting > by map name, as that is only needed for processing kernel maps, > and only when classifying symbols by section at load time. > Sort them by name using qsort() and do lookups using bsearch() > when map_groups__find_by_name() is used. > > perf parse: > > Ian Rogers: > > - Report initial event parsing error, providing a less cryptic message > to state that a PMU wasn't found in the system. > > perf vendor events: > > James Clark: > > - Fix commas so that PMU event files for arm64, power8 and power nine > become valid JSON. > > libtraceevent: > > Konstantin Khlebnikov: > > - Fix parsing of event %o and %X argument types. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > 66 files changed, 2888 insertions(+), 2366 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-11-12 18:37 Arnaldo Carvalho de Melo 2019-11-15 7:35 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-11-12 18:37 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Ian Rogers, Ravi Bangoria, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 56b2147f34d057b0898c53a3eb2e9e70756ab89f: Merge tag 'perf-core-for-mingo-5.5-20191107' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-11-12 12:06:08 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf-core-for-mingo-5.5-20191112 for you to fetch changes up to e1e9b78d3957a267346a86c8f2c433f6a332af65: perf parse: Use YYABORT to clear stack after failure, plugging leaks (2019-11-12 08:34:16 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf record: Ravi Bangoria: - Provide an option to print perf_event_open args and syscall return value. This was already possible using -v, but then lots of other debug info would be output as well, provide a way to show just the syscall args and return value, e.g.: # perf --debug perf-event-open=1 record perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 <SNIP> ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid 4308 cpu 0 group_fd -1 flags 0x8 = 4 core: - Remove map->groups, we can get that information in other ways, reduces the size of a key data structure and paves the way to have it shared by multiple threads. - Use 'struct map_symbol' in more places, where we already were using a 'struct map' + 'struct symbol', this helps passing that usual pair of information across callchain, browser code, etc. - Add 'struct map_groups' (where the map_symbol->map is) to 'struct map_symbol', to ease annotation code, for instance, where we call from functions in one map we're browsing to functions in another DSO, mapped in another 'struct map'. event parsing: Ian Rogers: - Use YYABORT to clear stack after failure, plugging leaks Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Arnaldo Carvalho de Melo (13): perf map: Use map->dso->kernel + map__kmaps() in map__kmaps() perf symbols: Stop using map->groups, we can use kmaps instead perf map_groups: Pass the object to map_groups__find_ams() perf tools: Add map_groups to 'struct addr_location' perf annotate: Pass a 'map_symbol' in places receiving a pair of 'map' and 'symbol' pointers perf unwind: Use 'struct map_symbol' in 'struct unwind_entry' perf callchain: Use 'struct map_symbol' in 'struct callchain_cursor_node' pref tools: Make 'struct addr_map_symbol' contain 'struct map_symbol' perf symbols: Use kmaps(map)->machine when we know its a kernel map perf tools: Add a 'struct map_groups' pointer to 'struct map_symbol' perf annotate: Stop using map->groups, use map_symbol->mg instead perf map: Combine maps__fixup_overlappings with its only use perf map: Remove ->groups from 'struct map' Ian Rogers (1): perf parse: Use YYABORT to clear stack after failure, plugging leaks Ravi Bangoria (1): perf tool: Provide an option to print perf_event_open args and return value tools/perf/Documentation/perf.txt | 2 + tools/perf/arch/s390/annotate/instructions.c | 8 +- tools/perf/builtin-annotate.c | 6 +- tools/perf/builtin-kmem.c | 4 +- tools/perf/builtin-report.c | 2 +- tools/perf/builtin-sched.c | 2 +- tools/perf/builtin-top.c | 6 +- tools/perf/tests/dwarf-unwind.c | 2 +- tools/perf/ui/browsers/annotate.c | 25 +++-- tools/perf/ui/browsers/hists.c | 20 ++-- tools/perf/ui/gtk/annotate.c | 27 +++--- tools/perf/util/annotate.c | 105 ++++++++++----------- tools/perf/util/annotate.h | 22 ++--- tools/perf/util/callchain.c | 40 ++++---- tools/perf/util/callchain.h | 5 +- tools/perf/util/db-export.c | 16 ++-- tools/perf/util/debug.c | 2 + tools/perf/util/debug.h | 9 ++ tools/perf/util/event.c | 6 +- tools/perf/util/evsel.c | 36 +++---- tools/perf/util/evsel_fprintf.c | 29 +++--- tools/perf/util/hist.c | 58 ++++++------ tools/perf/util/machine.c | 48 ++++++---- tools/perf/util/map.c | 46 +++------ tools/perf/util/map.h | 1 - tools/perf/util/map_groups.h | 2 +- tools/perf/util/map_symbol.h | 5 +- tools/perf/util/mem-events.c | 2 +- tools/perf/util/parse-events.y | 3 +- tools/perf/util/python.c | 1 + .../perf/util/scripting-engines/trace-event-perl.c | 16 ++-- .../util/scripting-engines/trace-event-python.c | 18 ++-- tools/perf/util/sort.c | 89 ++++++++--------- tools/perf/util/symbol-elf.c | 2 +- tools/perf/util/symbol.c | 16 +--- tools/perf/util/symbol.h | 2 +- tools/perf/util/unwind-libdw.c | 7 +- tools/perf/util/unwind-libunwind-local.c | 7 +- tools/perf/util/unwind.h | 8 +- 39 files changed, 347 insertions(+), 358 deletions(-) Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. Manjaro is failing due to some missing library related to bison, looks like a distro bug. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.4.0-rc7.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 17 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20191101 gcc-9-branch@277702, clang version 9.0.0 (tags/RELEASE_900/final) 18 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 19 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 20 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 21 debian:experimental : Ok gcc (Debian 9.2.1-9) 9.2.1 20191008, clang version 8.0.1-3+b1 (tags/RELEASE_801/final) 22 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 24 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-19) 8.3.0 25 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 26 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 27 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 28 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 29 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 30 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 31 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 32 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 33 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 34 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 35 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 36 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 37 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 39 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 40 fedora:32 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 41 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 42 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 43 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 44 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 45 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 46 manjaro:latest : FAIL gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final) 47 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 48 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 7.0.1 (tags/RELEASE_701/final 349238) 49 opensuse:15.2 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 7.0.1 (tags/RELEASE_701/final 349238) 50 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 51 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 9.0.0 (tags/RELEASE_900/final 372316) 52 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 53 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 54 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 55 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 56 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 57 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 58 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 63 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 65 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 75 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 76 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 77 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 78 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 79 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 80 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final) # # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 e1e9b78d3957 perf parse: Use YYABORT to clear stack after failure, plugging leaks # perf version --build-options perf version 5.4.rc7.ge1e9b78d3957 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_libelf_O: make NO_LIBELF=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_demangle_O: make NO_DEMANGLE=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_libbpf_O: make NO_LIBBPF=1 make_install_O: make install make_install_prefix_O: make install prefix=/tmp/krava make_cscope_O: make cscope make_no_libnuma_O: make NO_LIBNUMA=1 make_help_O: make help make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_debug_O: make DEBUG=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_perf_o_O: make perf.o make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libperl_O: make NO_LIBPERL=1 make_install_bin_O: make install-bin make_tags_O: make tags make_pure_O: make make_no_newt_O: make NO_NEWT=1 make_no_gtk2_O: make NO_GTK2=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_clean_all_O: make clean all make_doc_O: make doc make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_util_map_o_O: make util/map.o make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_slang_O: make NO_SLANG=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_auxtrace_O: make NO_AUXTRACE=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-11-12 18:37 Arnaldo Carvalho de Melo @ 2019-11-15 7:35 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-11-15 7:35 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Ian Rogers, Ravi Bangoria, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 56b2147f34d057b0898c53a3eb2e9e70756ab89f: > > Merge tag 'perf-core-for-mingo-5.5-20191107' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-11-12 12:06:08 +0100) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf-core-for-mingo-5.5-20191112 > > for you to fetch changes up to e1e9b78d3957a267346a86c8f2c433f6a332af65: > > perf parse: Use YYABORT to clear stack after failure, plugging leaks (2019-11-12 08:34:16 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf record: > > Ravi Bangoria: > > - Provide an option to print perf_event_open args and syscall return value. > This was already possible using -v, but then lots of other debug info > would be output as well, provide a way to show just the syscall args > and return value, e.g.: > > # perf --debug perf-event-open=1 record > perf_event_attr: > size 112 > { sample_period, sample_freq } 4000 > sample_type IP|TID|TIME|PERIOD > read_format ID > disabled 1 > inherit 1 > <SNIP> > ksymbol 1 > bpf_event 1 > ------------------------------------------------------------ > sys_perf_event_open: pid 4308 cpu 0 group_fd -1 flags 0x8 = 4 > > core: > > - Remove map->groups, we can get that information in other ways, reduces > the size of a key data structure and paves the way to have it shared > by multiple threads. > > - Use 'struct map_symbol' in more places, where we already were using a > 'struct map' + 'struct symbol', this helps passing that usual pair of > information across callchain, browser code, etc. > > - Add 'struct map_groups' (where the map_symbol->map is) to 'struct map_symbol', > to ease annotation code, for instance, where we call from functions in one map > we're browsing to functions in another DSO, mapped in another 'struct map'. > > event parsing: > > Ian Rogers: > > - Use YYABORT to clear stack after failure, plugging leaks > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Arnaldo Carvalho de Melo (13): > perf map: Use map->dso->kernel + map__kmaps() in map__kmaps() > perf symbols: Stop using map->groups, we can use kmaps instead > perf map_groups: Pass the object to map_groups__find_ams() > perf tools: Add map_groups to 'struct addr_location' > perf annotate: Pass a 'map_symbol' in places receiving a pair of 'map' and 'symbol' pointers > perf unwind: Use 'struct map_symbol' in 'struct unwind_entry' > perf callchain: Use 'struct map_symbol' in 'struct callchain_cursor_node' > pref tools: Make 'struct addr_map_symbol' contain 'struct map_symbol' > perf symbols: Use kmaps(map)->machine when we know its a kernel map > perf tools: Add a 'struct map_groups' pointer to 'struct map_symbol' > perf annotate: Stop using map->groups, use map_symbol->mg instead > perf map: Combine maps__fixup_overlappings with its only use > perf map: Remove ->groups from 'struct map' > > Ian Rogers (1): > perf parse: Use YYABORT to clear stack after failure, plugging leaks > > Ravi Bangoria (1): > perf tool: Provide an option to print perf_event_open args and return value > > tools/perf/Documentation/perf.txt | 2 + > tools/perf/arch/s390/annotate/instructions.c | 8 +- > tools/perf/builtin-annotate.c | 6 +- > tools/perf/builtin-kmem.c | 4 +- > tools/perf/builtin-report.c | 2 +- > tools/perf/builtin-sched.c | 2 +- > tools/perf/builtin-top.c | 6 +- > tools/perf/tests/dwarf-unwind.c | 2 +- > tools/perf/ui/browsers/annotate.c | 25 +++-- > tools/perf/ui/browsers/hists.c | 20 ++-- > tools/perf/ui/gtk/annotate.c | 27 +++--- > tools/perf/util/annotate.c | 105 ++++++++++----------- > tools/perf/util/annotate.h | 22 ++--- > tools/perf/util/callchain.c | 40 ++++---- > tools/perf/util/callchain.h | 5 +- > tools/perf/util/db-export.c | 16 ++-- > tools/perf/util/debug.c | 2 + > tools/perf/util/debug.h | 9 ++ > tools/perf/util/event.c | 6 +- > tools/perf/util/evsel.c | 36 +++---- > tools/perf/util/evsel_fprintf.c | 29 +++--- > tools/perf/util/hist.c | 58 ++++++------ > tools/perf/util/machine.c | 48 ++++++---- > tools/perf/util/map.c | 46 +++------ > tools/perf/util/map.h | 1 - > tools/perf/util/map_groups.h | 2 +- > tools/perf/util/map_symbol.h | 5 +- > tools/perf/util/mem-events.c | 2 +- > tools/perf/util/parse-events.y | 3 +- > tools/perf/util/python.c | 1 + > .../perf/util/scripting-engines/trace-event-perl.c | 16 ++-- > .../util/scripting-engines/trace-event-python.c | 18 ++-- > tools/perf/util/sort.c | 89 ++++++++--------- > tools/perf/util/symbol-elf.c | 2 +- > tools/perf/util/symbol.c | 16 +--- > tools/perf/util/symbol.h | 2 +- > tools/perf/util/unwind-libdw.c | 7 +- > tools/perf/util/unwind-libunwind-local.c | 7 +- > tools/perf/util/unwind.h | 8 +- > 39 files changed, 347 insertions(+), 358 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-10-21 13:37 Arnaldo Carvalho de Melo 2019-10-21 23:16 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-10-21 13:37 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Andi Kleen, Brendan Gregg, Daniel Bristot de Oliveira, Ian Rogers, Jin Yao, John Garry, Leo Yan, Steven Rostedt, Thomas Richter, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 39b656ee9f2ce41eb969c86525f9a2a63fefac5b: Merge tag 'perf-core-for-mingo-5.5-20191011' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-10-15 07:19:55 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191021 for you to fetch changes up to 27198a893ba074407e7a87e346252b3e6fab454f: perf trace: Use STUL_STRARRAY_FLAGS with mmap (2019-10-19 15:35:02 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf trace: - Add syscall failure stats to -s/--summary and -S/--with-summary, also works in combination with specifying just a set of syscalls, see below first with -s/--summary, then with -S/--with-summary just for the syscalls we saw failing with -s: # perf trace -s sleep 1 Summary of events: sleep (16218), 80 events, 93.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) ----------- ----- ------ -------- -------- -------- -------- ------ nanosleep 1 0 1000.091 1000.091 1000.091 1000.091 0.00% mmap 8 0 0.045 0.005 0.006 0.008 7.09% mprotect 4 0 0.028 0.005 0.007 0.009 11.38% openat 3 0 0.021 0.005 0.007 0.009 14.07% munmap 1 0 0.017 0.017 0.017 0.017 0.00% brk 4 0 0.010 0.001 0.002 0.004 23.15% read 4 0 0.009 0.002 0.002 0.003 8.13% close 5 0 0.008 0.001 0.002 0.002 10.83% fstat 3 0 0.006 0.002 0.002 0.002 6.97% access 1 1 0.006 0.006 0.006 0.006 0.00% lseek 3 0 0.005 0.001 0.002 0.002 7.37% arch_prctl 2 1 0.004 0.001 0.002 0.002 17.64% execve 1 0 0.000 0.000 0.000 0.000 0.00% # perf trace -e access,arch_prctl -S sleep 1 0.000 ( 0.006 ms): sleep/19503 arch_prctl(option: 0x3001, arg2: 0x7fff165996b0) = -1 EINVAL (Invalid argument) 0.024 ( 0.006 ms): sleep/19503 access(filename: 0x2177e510, mode: R) = -1 ENOENT (No such file or directory) 0.136 ( 0.002 ms): sleep/19503 arch_prctl(option: SET_FS, arg2: 0x7f9421737580) = 0 Summary of events: sleep (19503), 6 events, 50.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) ---------- ----- ------ ------ ------ ------ ------ ------ arch_prctl 2 1 0.008 0.002 0.004 0.006 57.22% access 1 1 0.006 0.006 0.006 0.006 0.00% # - Introduce --errno-summary, to drill down a bit more in the errno stats: # perf trace --errno-summary -e access,arch_prctl -S sleep 1 0.000 ( 0.006 ms): sleep/5587 arch_prctl(option: 0x3001, arg2: 0x7ffd6ba6aa00) = -1 EINVAL (Invalid argument) 0.028 ( 0.007 ms): sleep/5587 access(filename: 0xb83d9510, mode: R) = -1 ENOENT (No such file or directory) 0.172 ( 0.003 ms): sleep/5587 arch_prctl(option: SET_FS, arg2: 0x7f45b8392580) = 0 Summary of events: sleep (5587), 6 events, 50.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) ---------- ----- ------ ------ ------ ------ ------ ------ arch_prctl 2 1 0.009 0.003 0.005 0.006 38.90% EINVAL: 1 access 1 1 0.007 0.007 0.007 0.007 0.00% ENOENT: 1 # - Filter own pid to avoid a feedback look in 'perf trace record -a' - Add the glue for the auto generated x86 IRQ vector array. - Show error message when not finding a field used in a filter expression # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="cnt>32767" Failed to set filter "(cnt>32767) && (common_pid != 19938 && common_pid != 8922)" on event syscalls:sys_enter_write with 22 (Invalid argument) # # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="count>32767" 0.000 python3.5/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dc53600, count: 172086) 12.641 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db63660, count: 75994) 27.738 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db4b1e0, count: 41635) 136.070 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dbab510, count: 62232) # - Add a generator for x86's IRQ vectors -> strings - Introduce stroul() (string -> number) methods for the strarray and strarrays classes, also strtoul_flags, allowing to go from both strings and or-ed strings to numbers, allowing things like: # perf trace -e syscalls:sys_enter_mmap --filter="flags==DENYWRITE|PRIVATE|FIXED" sleep 1 0.000 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2aa5000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000) 0.011 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2bf2000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000) 0.015 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2c3f000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000) # Allowing to narrow down from the complete set of mmap calls for that workload: # perf trace -e syscalls:sys_enter_mmap sleep 1 0.000 sleep/22695 syscalls:sys_enter_mmap(len: 134773, prot: READ, flags: PRIVATE, fd: 3) 0.041 sleep/22695 syscalls:sys_enter_mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) 0.053 sleep/22695 syscalls:sys_enter_mmap(len: 1857472, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3) 0.069 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd23ffb6000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000) 0.077 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240103000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000) 0.083 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240150000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000) 0.095 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240156000, len: 14272, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) 0.339 sleep/22695 syscalls:sys_enter_mmap(len: 217750512, prot: READ, flags: PRIVATE, fd: 3) # Works with all targets, so, for system wide, looking at who calls mmap with flags set to just "PRIVATE": # perf trace --max-events=5 -e syscalls:sys_enter_mmap --filter="flags==PRIVATE" 0.000 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) 0.050 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) 0.062 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) 0.145 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18) 0.183 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18) # # perf trace --max-events=2 -e syscalls:sys_enter_lseek --filter="whence==SET && offset != 0" 0.000 Cache2 I/O/12047 syscalls:sys_enter_lseek(fd: 277, offset: 43, whence: SET) 1142.070 mozStorage #5/12302 syscalls:sys_enter_lseek(fd: 44</home/acme/.mozilla/firefox/ina67tev.default/cookies.sqlite-wal>, offset: 393536, whence: SET) # perf annotate: - Fix objdump --no-show-raw-insn flag to work with goth gcc and clang. - Streamline objdump execution, preserving the right error codes for better reporting to user. perf report: - Add warning when libunwind not compiled in. perf stat: Jin Yao: - Support --all-kernel/--all-user, to match options available in 'perf record', asking that all the events specified work just with kernel or user events. perf list: Jin Yao: - Hide deprecated events by default, allow showing them with --deprecated. libbperf: Jiri Olsa: - Allow to build with -ltcmalloc. - Finish mmap interface, getting more stuff from tools/perf while adding abstractions to avoid pulling too much stuff, to get libperf to grow as tools needs things like auxtrace, etc. perf scripting engines: Steven Rostedt (VMware): - Iterate on tep event arrays directly, fixing script generation with '-g python' when having multiple tracepoints in a perf.data file. core: - Allow to build with -ltcmalloc. perf test: Leo Yan: - Report failure for mmap events. - Avoid infinite loop for task exit case. - Remove needless headers for bp_account test. - Add dedicated checking helper is_supported(). - Disable bp_signal testing for arm64. Vendor events: arm64: John Garry: - Fix Hisi hip08 DDRC PMU eventname. - Add some missing events for Hisi hip08 DDRC, L3C and HHA PMUs. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Andi Kleen (2): perf script: Fix --reltime with --time perf evlist: Fix fix for freed id arrays Arnaldo Carvalho de Melo (25): perf trace: Add syscall failure stats to -s/--summary and -S/--with-summary perf trace: Introduce --errno-summary perf string: Export asprintf__tp_filter_pids() perf trace: Filter own pid to avoid a feedback look in 'perf trace record -a' perf trace: Support tracepoint dynamic char arrays tools arch x86: Grab a copy of the file containing the IRQ vector defines libbeauty: Add a generator for x86's IRQ vectors -> strings libbeauty: Hook up the x86 irq_vectors table generator libbeauty: Add a strarray__scnprintf_suffix() method perf trace beauty: Add the glue for the autogenerated x86 IRQ vector array perf trace: Hook the 'vec' tracepoint argument with the x86 IRQ vectors scnprintf/strtoul perf trace: Show error message when not finding a field used in a filter expression perf trace: Introduce accessors to trace specific evsel->priv perf trace: Hide evsel->access further, simplify code perf trace: Introduce 'struct evsel__trace' for evsel->priv needs perf trace: Initialize evsel_trace->fmt for syscalls:sys_enter_* tracepoints libbeauty: Introduce syscall_arg__strtoul_strarray() perf trace: Honour --max-events in processing syscalls:sys_enter_* perf trace: Pass a syscall_arg to syscall_arg_fmt->strtoul() libbeauty: Introduce syscall_arg__strtoul_strarrays() perf trace: Use strtoul for the fcntl 'cmd' argument libbeauty: Make the mmap_flags strarray visible outside of its beautifier libbeauty: Introduce strarray__strtoul_flags() perf trace: Wire up strarray__strtoul_flags() perf trace: Use STUL_STRARRAY_FLAGS with mmap Ian Rogers (5): perf annotate: Avoid reallocation in objdump parsing perf annotate: Use libsubcmd's run-command.h to fork objdump perf annotate: Don't pipe objdump output through 'grep' command perf annotate: Don't pipe objdump output through 'expand' command perf annotate: Fix objdump --no-show-raw-insn flag Jin Yao (3): perf report: Add warning when libunwind not compiled in perf stat: Support --all-kernel/--all-user perf list: Hide deprecated events by default Jiri Olsa (10): perf tools: Allow to build with -ltcmalloc libperf: Introduce perf_evlist__for_each_mmap() libperf: Move mmap allocation to perf_evlist__mmap_ops::get libperf: Move mask setup to perf_evlist__mmap_ops() libperf: Link static tests with libapi.a libperf: Add tests_mmap_thread test libperf: Add tests_mmap_cpus test libperf: Keep count of failed tests libperf: Do not export perf_evsel__init()/perf_evlist__init() libperf: Add pr_err() macro John Garry (4): perf vendor events arm64: Fix Hisi hip08 DDRC PMU eventname perf vendor events arm64: Add some missing events for Hisi hip08 DDRC PMU perf vendor events arm64: Add some missing events for Hisi hip08 L3C PMU perf vendor events arm64: Add some missing events for Hisi hip08 HHA PMU Leo Yan (5): perf test: Report failure for mmap events perf test: Avoid infinite loop for task exit case perf tests: Remove needless headers for bp_account perf tests bp_account: Add dedicated checking helper is_supported() perf tests: Disable bp_signal testing for arm64 Steven Rostedt (VMware) (2): perf scripting engines: Iterate on tep event arrays directly perf tools: Remove unused trace_find_next_event() Thomas Richter (1): perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy() tools/arch/x86/include/asm/irq_vectors.h | 146 +++++++ tools/perf/Documentation/perf-list.txt | 3 + tools/perf/Documentation/perf-stat.txt | 6 + tools/perf/Documentation/perf-trace.txt | 4 + tools/perf/Makefile.config | 5 + tools/perf/Makefile.perf | 10 + tools/perf/builtin-list.c | 14 +- tools/perf/builtin-report.c | 7 + tools/perf/builtin-script.c | 5 +- tools/perf/builtin-stat.c | 6 + tools/perf/builtin-trace.c | 420 ++++++++++++++++----- tools/perf/check-headers.sh | 1 + tools/perf/jvmti/Build | 6 +- tools/perf/lib/Makefile | 1 + tools/perf/lib/evlist.c | 71 +++- tools/perf/lib/include/internal/evlist.h | 3 + tools/perf/lib/include/internal/evsel.h | 1 + tools/perf/lib/include/internal/mmap.h | 5 +- tools/perf/lib/include/internal/tests.h | 20 +- tools/perf/lib/include/perf/core.h | 1 + tools/perf/lib/include/perf/evlist.h | 10 +- tools/perf/lib/include/perf/evsel.h | 2 - tools/perf/lib/internal.h | 3 + tools/perf/lib/libperf.map | 3 +- tools/perf/lib/mmap.c | 6 +- tools/perf/lib/tests/Makefile | 6 +- tools/perf/lib/tests/test-cpumap.c | 2 +- tools/perf/lib/tests/test-evlist.c | 219 ++++++++++- tools/perf/lib/tests/test-evsel.c | 2 +- tools/perf/lib/tests/test-threadmap.c | 2 +- .../arch/arm64/hisilicon/hip08/uncore-ddrc.json | 16 +- .../arch/arm64/hisilicon/hip08/uncore-hha.json | 23 +- .../arch/arm64/hisilicon/hip08/uncore-l3c.json | 56 +++ tools/perf/pmu-events/jevents.c | 26 +- tools/perf/pmu-events/jevents.h | 3 +- tools/perf/pmu-events/pmu-events.h | 1 + tools/perf/tests/bp_account.c | 20 +- tools/perf/tests/bp_signal.c | 15 +- tools/perf/tests/builtin-test.c | 2 +- tools/perf/tests/task-exit.c | 9 + tools/perf/tests/tests.h | 1 + tools/perf/trace/beauty/beauty.h | 19 + tools/perf/trace/beauty/mmap.c | 4 +- tools/perf/trace/beauty/tracepoints/Build | 1 + .../trace/beauty/tracepoints/x86_irq_vectors.c | 29 ++ .../trace/beauty/tracepoints/x86_irq_vectors.sh | 27 ++ tools/perf/util/annotate.c | 196 ++++++---- tools/perf/util/evlist.c | 34 +- tools/perf/util/parse-events.c | 4 +- tools/perf/util/parse-events.h | 2 +- tools/perf/util/pmu.c | 17 +- tools/perf/util/pmu.h | 4 +- .../perf/util/scripting-engines/trace-event-perl.c | 8 +- .../util/scripting-engines/trace-event-python.c | 9 +- tools/perf/util/stat.c | 10 + tools/perf/util/stat.h | 2 + tools/perf/util/string2.h | 3 + tools/perf/util/time-utils.c | 27 +- tools/perf/util/time-utils.h | 5 + tools/perf/util/trace-event-parse.c | 31 -- tools/perf/util/trace-event.h | 2 - 61 files changed, 1307 insertions(+), 289 deletions(-) create mode 100644 tools/arch/x86/include/asm/irq_vectors.h create mode 100644 tools/perf/trace/beauty/tracepoints/x86_irq_vectors.c create mode 100755 tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.4.0-rc3.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 17 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20190930 gcc-9-branch@276275, clang version 8.0.0 (tags/RELEASE_800/final) 18 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 19 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 20 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 21 debian:experimental : Ok gcc (Debian 9.2.1-8) 9.2.1 20190909, clang version 8.0.1-3+b1 (tags/RELEASE_801/final) 22 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 24 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-19) 8.3.0 25 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 26 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 27 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 28 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 29 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 30 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 31 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 32 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 33 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 34 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 35 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 36 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 37 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 39 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 40 fedora:32 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 41 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 42 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 43 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 44 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 45 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 46 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 8.0.1 (tags/RELEASE_801/final) 47 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 48 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 7.0.1 (tags/RELEASE_701/final 349238) 49 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 50 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 8.0.1 (tags/RELEASE_801/final 366581) 51 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 52 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 53 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 54 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 55 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 56 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 57 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 63 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 64 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 74 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 75 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 76 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 78 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 79 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-8ubuntu1) 9.2.1 20190909, clang version 9.0.0-+rc5-1~exp1 (tags/RELEASE_900/rc5) # # uname -a Linux quaco 5.2.18-200.fc30.x86_64 #1 SMP Tue Oct 1 13:14:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 27198a893ba0 perf trace: Use STUL_STRARRAY_FLAGS with mmap # perf version --build-options perf version 5.4.rc3.g27198a893ba0 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_with_babeltrace_O: make LIBBABELTRACE=1 make_perf_o_O: make perf.o make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_libnuma_O: make NO_LIBNUMA=1 make_no_libperl_O: make NO_LIBPERL=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_util_map_o_O: make util/map.o make_no_gtk2_O: make NO_GTK2=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_pure_O: make make_no_libbpf_O: make NO_LIBBPF=1 make_clean_all_O: make clean all make_install_bin_O: make install-bin make_no_demangle_O: make NO_DEMANGLE=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_debug_O: make DEBUG=1 make_no_newt_O: make NO_NEWT=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_tags_O: make tags make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_doc_O: make doc make_no_backtrace_O: make NO_BACKTRACE=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_slang_O: make NO_SLANG=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_help_O: make help make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_no_libelf_O: make NO_LIBELF=1 make_cscope_O: make cscope make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_install_O: make install make_with_clangllvm_O: make LIBCLANGLLVM=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-10-21 13:37 Arnaldo Carvalho de Melo @ 2019-10-21 23:16 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-10-21 23:16 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Andi Kleen, Brendan Gregg, Daniel Bristot de Oliveira, Ian Rogers, Jin Yao, John Garry, Leo Yan, Steven Rostedt, Thomas Richter, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 39b656ee9f2ce41eb969c86525f9a2a63fefac5b: > > Merge tag 'perf-core-for-mingo-5.5-20191011' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-10-15 07:19:55 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191021 > > for you to fetch changes up to 27198a893ba074407e7a87e346252b3e6fab454f: > > perf trace: Use STUL_STRARRAY_FLAGS with mmap (2019-10-19 15:35:02 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf trace: > > - Add syscall failure stats to -s/--summary and -S/--with-summary, also works in > combination with specifying just a set of syscalls, see below first with > -s/--summary, then with -S/--with-summary just for the syscalls we saw failing > with -s: > > # perf trace -s sleep 1 > > Summary of events: > > sleep (16218), 80 events, 93.0% > > syscall calls errors total min avg max stddev > (msec) (msec) (msec) (msec) (%) > ----------- ----- ------ -------- -------- -------- -------- ------ > nanosleep 1 0 1000.091 1000.091 1000.091 1000.091 0.00% > mmap 8 0 0.045 0.005 0.006 0.008 7.09% > mprotect 4 0 0.028 0.005 0.007 0.009 11.38% > openat 3 0 0.021 0.005 0.007 0.009 14.07% > munmap 1 0 0.017 0.017 0.017 0.017 0.00% > brk 4 0 0.010 0.001 0.002 0.004 23.15% > read 4 0 0.009 0.002 0.002 0.003 8.13% > close 5 0 0.008 0.001 0.002 0.002 10.83% > fstat 3 0 0.006 0.002 0.002 0.002 6.97% > access 1 1 0.006 0.006 0.006 0.006 0.00% > lseek 3 0 0.005 0.001 0.002 0.002 7.37% > arch_prctl 2 1 0.004 0.001 0.002 0.002 17.64% > execve 1 0 0.000 0.000 0.000 0.000 0.00% > > # perf trace -e access,arch_prctl -S sleep 1 > 0.000 ( 0.006 ms): sleep/19503 arch_prctl(option: 0x3001, arg2: 0x7fff165996b0) = -1 EINVAL (Invalid argument) > 0.024 ( 0.006 ms): sleep/19503 access(filename: 0x2177e510, mode: R) = -1 ENOENT (No such file or directory) > 0.136 ( 0.002 ms): sleep/19503 arch_prctl(option: SET_FS, arg2: 0x7f9421737580) = 0 > > Summary of events: > > sleep (19503), 6 events, 50.0% > > syscall calls errors total min avg max stddev > (msec) (msec) (msec) (msec) (%) > ---------- ----- ------ ------ ------ ------ ------ ------ > arch_prctl 2 1 0.008 0.002 0.004 0.006 57.22% > access 1 1 0.006 0.006 0.006 0.006 0.00% > > # > > - Introduce --errno-summary, to drill down a bit more in the errno stats: > > # perf trace --errno-summary -e access,arch_prctl -S sleep 1 > 0.000 ( 0.006 ms): sleep/5587 arch_prctl(option: 0x3001, arg2: 0x7ffd6ba6aa00) = -1 EINVAL (Invalid argument) > 0.028 ( 0.007 ms): sleep/5587 access(filename: 0xb83d9510, mode: R) = -1 ENOENT (No such file or directory) > 0.172 ( 0.003 ms): sleep/5587 arch_prctl(option: SET_FS, arg2: 0x7f45b8392580) = 0 > > Summary of events: > > sleep (5587), 6 events, 50.0% > > syscall calls errors total min avg max stddev > (msec) (msec) (msec) (msec) (%) > ---------- ----- ------ ------ ------ ------ ------ ------ > arch_prctl 2 1 0.009 0.003 0.005 0.006 38.90% > EINVAL: 1 > access 1 1 0.007 0.007 0.007 0.007 0.00% > ENOENT: 1 > # > > - Filter own pid to avoid a feedback look in 'perf trace record -a' > > - Add the glue for the auto generated x86 IRQ vector array. > > - Show error message when not finding a field used in a filter expression > > # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="cnt>32767" > Failed to set filter "(cnt>32767) && (common_pid != 19938 && common_pid != 8922)" on event syscalls:sys_enter_write with 22 (Invalid argument) > # > # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="count>32767" > 0.000 python3.5/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dc53600, count: 172086) > 12.641 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db63660, count: 75994) > 27.738 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db4b1e0, count: 41635) > 136.070 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dbab510, count: 62232) > # > > - Add a generator for x86's IRQ vectors -> strings > > - Introduce stroul() (string -> number) methods for the strarray and > strarrays classes, also strtoul_flags, allowing to go from both strings > and or-ed strings to numbers, allowing things like: > > # perf trace -e syscalls:sys_enter_mmap --filter="flags==DENYWRITE|PRIVATE|FIXED" sleep 1 > 0.000 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2aa5000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000) > 0.011 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2bf2000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000) > 0.015 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2c3f000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000) > # > > Allowing to narrow down from the complete set of mmap calls for that workload: > > # perf trace -e syscalls:sys_enter_mmap sleep 1 > 0.000 sleep/22695 syscalls:sys_enter_mmap(len: 134773, prot: READ, flags: PRIVATE, fd: 3) > 0.041 sleep/22695 syscalls:sys_enter_mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) > 0.053 sleep/22695 syscalls:sys_enter_mmap(len: 1857472, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3) > 0.069 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd23ffb6000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000) > 0.077 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240103000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000) > 0.083 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240150000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000) > 0.095 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240156000, len: 14272, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) > 0.339 sleep/22695 syscalls:sys_enter_mmap(len: 217750512, prot: READ, flags: PRIVATE, fd: 3) > # > > Works with all targets, so, for system wide, looking at who calls mmap with flags set to just "PRIVATE": > > # perf trace --max-events=5 -e syscalls:sys_enter_mmap --filter="flags==PRIVATE" > 0.000 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) > 0.050 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) > 0.062 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) > 0.145 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18) > 0.183 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18) > # > > # perf trace --max-events=2 -e syscalls:sys_enter_lseek --filter="whence==SET && offset != 0" > 0.000 Cache2 I/O/12047 syscalls:sys_enter_lseek(fd: 277, offset: 43, whence: SET) > 1142.070 mozStorage #5/12302 syscalls:sys_enter_lseek(fd: 44</home/acme/.mozilla/firefox/ina67tev.default/cookies.sqlite-wal>, offset: 393536, whence: SET) > # > > perf annotate: > > - Fix objdump --no-show-raw-insn flag to work with goth gcc and clang. > > - Streamline objdump execution, preserving the right error codes for better > reporting to user. > > perf report: > > - Add warning when libunwind not compiled in. > > perf stat: > > Jin Yao: > > - Support --all-kernel/--all-user, to match options available in 'perf record', > asking that all the events specified work just with kernel or user events. > > perf list: > > Jin Yao: > > - Hide deprecated events by default, allow showing them with --deprecated. > > libbperf: > > Jiri Olsa: > > - Allow to build with -ltcmalloc. > > - Finish mmap interface, getting more stuff from tools/perf while adding > abstractions to avoid pulling too much stuff, to get libperf to grow as > tools needs things like auxtrace, etc. > > perf scripting engines: > > Steven Rostedt (VMware): > > - Iterate on tep event arrays directly, fixing script generation with > '-g python' when having multiple tracepoints in a perf.data file. > > core: > > - Allow to build with -ltcmalloc. > > perf test: > > Leo Yan: > > - Report failure for mmap events. > > - Avoid infinite loop for task exit case. > > - Remove needless headers for bp_account test. > > - Add dedicated checking helper is_supported(). > > - Disable bp_signal testing for arm64. > > Vendor events: > > arm64: > > John Garry: > > - Fix Hisi hip08 DDRC PMU eventname. > > - Add some missing events for Hisi hip08 DDRC, L3C and HHA PMUs. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Andi Kleen (2): > perf script: Fix --reltime with --time > perf evlist: Fix fix for freed id arrays > > Arnaldo Carvalho de Melo (25): > perf trace: Add syscall failure stats to -s/--summary and -S/--with-summary > perf trace: Introduce --errno-summary > perf string: Export asprintf__tp_filter_pids() > perf trace: Filter own pid to avoid a feedback look in 'perf trace record -a' > perf trace: Support tracepoint dynamic char arrays > tools arch x86: Grab a copy of the file containing the IRQ vector defines > libbeauty: Add a generator for x86's IRQ vectors -> strings > libbeauty: Hook up the x86 irq_vectors table generator > libbeauty: Add a strarray__scnprintf_suffix() method > perf trace beauty: Add the glue for the autogenerated x86 IRQ vector array > perf trace: Hook the 'vec' tracepoint argument with the x86 IRQ vectors scnprintf/strtoul > perf trace: Show error message when not finding a field used in a filter expression > perf trace: Introduce accessors to trace specific evsel->priv > perf trace: Hide evsel->access further, simplify code > perf trace: Introduce 'struct evsel__trace' for evsel->priv needs > perf trace: Initialize evsel_trace->fmt for syscalls:sys_enter_* tracepoints > libbeauty: Introduce syscall_arg__strtoul_strarray() > perf trace: Honour --max-events in processing syscalls:sys_enter_* > perf trace: Pass a syscall_arg to syscall_arg_fmt->strtoul() > libbeauty: Introduce syscall_arg__strtoul_strarrays() > perf trace: Use strtoul for the fcntl 'cmd' argument > libbeauty: Make the mmap_flags strarray visible outside of its beautifier > libbeauty: Introduce strarray__strtoul_flags() > perf trace: Wire up strarray__strtoul_flags() > perf trace: Use STUL_STRARRAY_FLAGS with mmap > > Ian Rogers (5): > perf annotate: Avoid reallocation in objdump parsing > perf annotate: Use libsubcmd's run-command.h to fork objdump > perf annotate: Don't pipe objdump output through 'grep' command > perf annotate: Don't pipe objdump output through 'expand' command > perf annotate: Fix objdump --no-show-raw-insn flag > > Jin Yao (3): > perf report: Add warning when libunwind not compiled in > perf stat: Support --all-kernel/--all-user > perf list: Hide deprecated events by default > > Jiri Olsa (10): > perf tools: Allow to build with -ltcmalloc > libperf: Introduce perf_evlist__for_each_mmap() > libperf: Move mmap allocation to perf_evlist__mmap_ops::get > libperf: Move mask setup to perf_evlist__mmap_ops() > libperf: Link static tests with libapi.a > libperf: Add tests_mmap_thread test > libperf: Add tests_mmap_cpus test > libperf: Keep count of failed tests > libperf: Do not export perf_evsel__init()/perf_evlist__init() > libperf: Add pr_err() macro > > John Garry (4): > perf vendor events arm64: Fix Hisi hip08 DDRC PMU eventname > perf vendor events arm64: Add some missing events for Hisi hip08 DDRC PMU > perf vendor events arm64: Add some missing events for Hisi hip08 L3C PMU > perf vendor events arm64: Add some missing events for Hisi hip08 HHA PMU > > Leo Yan (5): > perf test: Report failure for mmap events > perf test: Avoid infinite loop for task exit case > perf tests: Remove needless headers for bp_account > perf tests bp_account: Add dedicated checking helper is_supported() > perf tests: Disable bp_signal testing for arm64 > > Steven Rostedt (VMware) (2): > perf scripting engines: Iterate on tep event arrays directly > perf tools: Remove unused trace_find_next_event() > > Thomas Richter (1): > perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy() > > tools/arch/x86/include/asm/irq_vectors.h | 146 +++++++ > tools/perf/Documentation/perf-list.txt | 3 + > tools/perf/Documentation/perf-stat.txt | 6 + > tools/perf/Documentation/perf-trace.txt | 4 + > tools/perf/Makefile.config | 5 + > tools/perf/Makefile.perf | 10 + > tools/perf/builtin-list.c | 14 +- > tools/perf/builtin-report.c | 7 + > tools/perf/builtin-script.c | 5 +- > tools/perf/builtin-stat.c | 6 + > tools/perf/builtin-trace.c | 420 ++++++++++++++++----- > tools/perf/check-headers.sh | 1 + > tools/perf/jvmti/Build | 6 +- > tools/perf/lib/Makefile | 1 + > tools/perf/lib/evlist.c | 71 +++- > tools/perf/lib/include/internal/evlist.h | 3 + > tools/perf/lib/include/internal/evsel.h | 1 + > tools/perf/lib/include/internal/mmap.h | 5 +- > tools/perf/lib/include/internal/tests.h | 20 +- > tools/perf/lib/include/perf/core.h | 1 + > tools/perf/lib/include/perf/evlist.h | 10 +- > tools/perf/lib/include/perf/evsel.h | 2 - > tools/perf/lib/internal.h | 3 + > tools/perf/lib/libperf.map | 3 +- > tools/perf/lib/mmap.c | 6 +- > tools/perf/lib/tests/Makefile | 6 +- > tools/perf/lib/tests/test-cpumap.c | 2 +- > tools/perf/lib/tests/test-evlist.c | 219 ++++++++++- > tools/perf/lib/tests/test-evsel.c | 2 +- > tools/perf/lib/tests/test-threadmap.c | 2 +- > .../arch/arm64/hisilicon/hip08/uncore-ddrc.json | 16 +- > .../arch/arm64/hisilicon/hip08/uncore-hha.json | 23 +- > .../arch/arm64/hisilicon/hip08/uncore-l3c.json | 56 +++ > tools/perf/pmu-events/jevents.c | 26 +- > tools/perf/pmu-events/jevents.h | 3 +- > tools/perf/pmu-events/pmu-events.h | 1 + > tools/perf/tests/bp_account.c | 20 +- > tools/perf/tests/bp_signal.c | 15 +- > tools/perf/tests/builtin-test.c | 2 +- > tools/perf/tests/task-exit.c | 9 + > tools/perf/tests/tests.h | 1 + > tools/perf/trace/beauty/beauty.h | 19 + > tools/perf/trace/beauty/mmap.c | 4 +- > tools/perf/trace/beauty/tracepoints/Build | 1 + > .../trace/beauty/tracepoints/x86_irq_vectors.c | 29 ++ > .../trace/beauty/tracepoints/x86_irq_vectors.sh | 27 ++ > tools/perf/util/annotate.c | 196 ++++++---- > tools/perf/util/evlist.c | 34 +- > tools/perf/util/parse-events.c | 4 +- > tools/perf/util/parse-events.h | 2 +- > tools/perf/util/pmu.c | 17 +- > tools/perf/util/pmu.h | 4 +- > .../perf/util/scripting-engines/trace-event-perl.c | 8 +- > .../util/scripting-engines/trace-event-python.c | 9 +- > tools/perf/util/stat.c | 10 + > tools/perf/util/stat.h | 2 + > tools/perf/util/string2.h | 3 + > tools/perf/util/time-utils.c | 27 +- > tools/perf/util/time-utils.h | 5 + > tools/perf/util/trace-event-parse.c | 31 -- > tools/perf/util/trace-event.h | 2 - > 61 files changed, 1307 insertions(+), 289 deletions(-) > create mode 100644 tools/arch/x86/include/asm/irq_vectors.h > create mode 100644 tools/perf/trace/beauty/tracepoints/x86_irq_vectors.c > create mode 100755 tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-10-11 20:04 Arnaldo Carvalho de Melo 2019-10-15 5:25 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-10-11 20:04 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Andi Kleen, Björn Töpel, Ian Rogers, Jin Yao, John Garry, KP Singh, Arnaldo Carvalho de Melo Hi Ingo, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit f733c6b508bcaa3441ba1eacf16efb9abd47489f: perf/core: Fix inheritance of aux_output groups (2019-10-07 16:50:42 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191011 for you to fetch changes up to cebf7d51a6c3babc4d0589da7aec0de1af0a5691: perf diff: Report noisy for cycles diff (2019-10-11 10:57:00 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf trace: Arnaldo Carvalho de Melo: - Reuse the strace-like syscall_arg_fmt->scnprintf() beautification routines (convert integer arguments into strings, like open flags, etc) in tracepoint arguments. For now the type based scnprintf routines (pid_t, umode_t, etc) and the ones based in well known arg name based ("fd", etc) gets associated with tracepoint args of that type. A tracepoint only arg, "msr", for the msr:{write,read}_msr gets added as an initial step. - Introduce syscall_arg_fmt->strtoul() methods to be the reverse operation of ->scnprintf(), i.e. to go from a string to an integer. - Implement --filter, just like in 'perf record', that affects the tracepoint events specied thus far in the command line, use the ->strtoul() methods to allow strings in tables associated with beautifiers to the integers the in-kernel tracepoint (eBPF later) filters expect, e.g.: # perf trace --max-events 1 -e sched:*ipi --filter="cpu==1 || cpu==2" 0.000 as/24630 sched:sched_wake_idle_without_ipi(cpu: 1) # # perf trace --max-events 1 --max-stack=32 -e msr:* --filter="msr==IA32_TSC_DEADLINE" 207.000 cc1/19963 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 5442316760822) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) lapic_next_deadline ([kernel.kallsyms]) clockevents_program_event ([kernel.kallsyms]) hrtimer_interrupt ([kernel.kallsyms]) smp_apic_timer_interrupt ([kernel.kallsyms]) apic_timer_interrupt ([kernel.kallsyms]) [0x6ff66c] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) [0x7047c3] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) [0x707708] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) execute_one_pass (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) [0x4f3d37] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) [0x4f3d49] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) execute_pass_list (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) cgraph_node::expand (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) [0x2625b4] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) symbol_table::finalize_compilation_unit (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) [0x5ae8b9] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) toplev::main (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) main (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1) [0x26b6a] (/usr/lib/x86_64-linux-gnu/libc-2.29.so) # # perf trace --max-events 8 -e msr:* --filter="msr==IA32_SPEC_CTRL" 0.000 :13281/13281 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) 0.063 migration/3/25 msr:write_msr(msr: IA32_SPEC_CTRL) 0.217 kworker/u16:1-/4826 msr:write_msr(msr: IA32_SPEC_CTRL) 0.687 rcu_sched/11 msr:write_msr(msr: IA32_SPEC_CTRL) 0.696 :13280/13280 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) 0.305 :13281/13281 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) 0.355 :13274/13274 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) 2.743 kworker/u16:0-/6711 msr:write_msr(msr: IA32_SPEC_CTRL) # # perf trace --max-events 8 --cpu 1 -e msr:* --filter="msr!=IA32_SPEC_CTRL && msr!=IA32_TSC_DEADLINE && msr != FS_BASE" 0.000 mtr-packet/30819 msr:write_msr(msr: 0x830, val: 68719479037) 0.096 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST) 238.925 mtr-packet/30819 msr:write_msr(msr: 0x830, val: 8589936893) 511.010 :0/0 msr:write_msr(msr: 0x830, val: 68719479037) 1005.052 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST) 1235.131 CPU 0/KVM/3750 msr:write_msr(msr: 0x830, val: 4294969595) 1235.195 CPU 0/KVM/3750 msr:read_msr(msr: IA32_SYSENTER_ESP, val: -2199023037952) 1235.201 CPU 0/KVM/3750 msr:read_msr(msr: IA32_APICBASE, val: 4276096000) # - Default to not using libtraceevent and its plugins for beautifying tracepoint arguments, since now we're reusing the strace-like beautifiers. Use --libtraceevent_print (using just --libtrace is unambiguous and can be used as a short hand) to go back to those beautifiers. This will help in the transition, as can be seen in some of the sched tracepoints that still need some work in the libbeauty based mode: # trace --no-inherit -e msr:*,*sleep,sched:* sleep 1 0.000 ( ): sched:sched_waking(comm: "trace", pid: 3319 (trace), prio: 120, success: 1) 0.006 ( ): sched:sched_wakeup(comm: "trace", pid: 3319 (trace), prio: 120, success: 1) 0.348 ( ): sched:sched_process_exec(filename: 140212596720100, pid: 3319 (sleep), old_pid: 3319 (sleep)) 0.490 ( ): msr:write_msr(msr: FS_BASE, val: 139631189321088) 0.670 ( ): nanosleep(rqtp: 0x7ffc52c23bc0) ... 0.674 ( ): sched:sched_stat_runtime(comm: "sleep", pid: 3319 (sleep), runtime: 659259, vruntime: 78942418342) 0.675 ( ): sched:sched_switch(prev_comm: "sleep", prev_pid: 3319 (sleep), prev_prio: 120, prev_state: 1, next_comm: "swapper/0", next_prio: 120) 1001.059 ( ): sched:sched_waking(comm: "sleep", pid: 3319 (sleep), prio: 120, success: 1) 1001.098 ( ): sched:sched_wakeup(comm: "sleep", pid: 3319 (sleep), prio: 120, success: 1) 0.670 (1000.504 ms): ... [continued]: nanosleep()) = 0 1001.456 ( ): sched:sched_process_exit(comm: "sleep", pid: 3319 (sleep), prio: 120) # trace --libtrace --no-inherit -e msr:*,*sleep,sched:* sleep 1 # trace --libtrace --no-inherit -e msr:*,*sleep,sched:* sleep 1 0.000 ( ): sched:sched_waking(comm=trace pid=3323 prio=120 target_cpu=000) 0.007 ( ): sched:sched_wakeup(comm=trace pid=3323 prio=120 target_cpu=000) 0.382 ( ): sched:sched_process_exec(filename=/usr/bin/sleep pid=3323 old_pid=3323) 0.525 ( ): msr:write_msr(c0000100, value 7f5d508a0580) 0.713 ( ): nanosleep(rqtp: 0x7fff487fb4a0) ... 0.717 ( ): sched:sched_stat_runtime(comm=sleep pid=3323 runtime=617722 [ns] vruntime=78957731636 [ns]) 0.719 ( ): sched:sched_switch(prev_comm=sleep prev_pid=3323 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120) 1001.117 ( ): sched:sched_waking(comm=sleep pid=3323 prio=120 target_cpu=000) 1001.157 ( ): sched:sched_wakeup(comm=sleep pid=3323 prio=120 target_cpu=000) 0.713 (1000.522 ms): ... [continued]: nanosleep()) = 0 1001.538 ( ): sched:sched_process_exit(comm=sleep pid=3323 prio=120) # - Make -v (verbose) mode be honoured for .perfconfig based trace.add_events, to help in diagnosing problems with building eBPF events (-e source.c). - When using eBPF syscall payload augmentation do not show strace-like syscalls when all the user specified was some tracepoint event, bringing the behaviour in line with that of when not using eBPF augmentation. Intel PT: exported-sql-viewer GUI: Adrian Hunter: - Add LookupModel, HBoxLayout, VBoxLayout, global time range calculations so as to add a time chart by CPU. perf script: Andi Kleen: - Allow --time (to specify a time span of interest) with --reltime perf diff: Jin Yao: - Report noise for cycles diff, i.e. a histogram + stddev. (timestamps relative to start). perf annotate: Arnaldo Carvalho de Melo: - Initialize env->cpuid when running in live mode (perf top), as it is used in some of the per arch annotation init routines. samples bpf: Björn Töpel: - Fixup fallout of using tools/perf/perf-sys. from outside tools/perf. Core: Ian Rogers: - Avoid 'sample_reg_masks' being const + weak, as this breaks with some compilers that constant-propagate from the weak symbol. libperf: - First part of moving the perf_mmap class from tools/perf to libperf. - Propagate CFLAGS to libperf from the tools/perf Makefile. Vendor events: John Garry: - Add entry in MAINTAINERS with reviewers for the for perf tool arm64 pmu-events files. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (6): perf scripts python: exported-sql-viewer.py: Add LookupModel() perf scripts python: exported-sql-viewer.py: Add HBoxLayout and VBoxLayout perf scripts python: exported-sql-viewer.py: Add global time range calculations perf scripts python: exported-sql-viewer.py: Tidy up Call tree call_time perf scripts python: exported-sql-viewer.py: Add ability for Call tree to open at a specified task and time perf scripts python: exported-sql-viewer.py: Add Time chart by CPU Andi Kleen (1): perf script: Allow --time with --reltime Arnaldo Carvalho de Melo (30): perf env: Add routine to read the env->cpuid from the running machine perf top: Initialize perf_env->cpuid, needed by the per arch annotation init routine perf evlist: Adopt __set_tracepoint_handlers method from perf_session perf trace: Make evlist__set_evsel_handler() affect just entries without a handler perf trace: Separate 'struct syscall_fmt' definition from syscall_fmts variable perf trace: Generalize the syscall_fmt find routines perf trace: Postpone parsing .perfconfig trace.add_events to after --verbose is processed perf trace augmented_syscalls: Do not show syscalls when none was asked for perf trace: Factor out the initialization of syscal_arg_fmt->scnprintf perf trace: Allocate an array of beautifiers for tracepoint args perf trace: Move some scnprintf methods from syscall to syscall_arg_fmt perf trace: Add the syscall_arg_fmt pointer to syscall_arg perf trace: Add array of chars scnprintf beautifier perf trace: Enclose all events argument lists with () perf trace: Allow choosing how to augment the tracepoint arguments tools arch x86: Grab a copy of the file containing the MSR numbers perf beauty: Make strarray's offset be u64 perf trace beauty: Add a x86 MSR cmd id->str table generator perf beauty: Hook up the x86 MSR table generator perf trace: Allow associating scnprintf routines with well known arg names perf trace beauty: Add the glue for the autogenerated MSR arrays perf trace: Associate the "msr" tracepoint arg name with x86_MSR__scnprintf() perf evlist: Factor out asprintf routine to build a tracepoint pid filter perf evlist: Introduce append_tp_filter() method perf evlist: Introduce append_tp_filter_pid() and append_tp_filter_pids() perf trace: Introduce --filter for tracepoint events perf trace: Add a strtoul() method to 'struct syscall_arg_fmt' perf trace: Introduce a strtoul() method for 'struct strarrays' perf trace: Expand strings in filters to integers perf beauty: Introduce strtoul() for x86 MSRs Björn Töpel (2): perf tools: Make usage of test_attr__* optional for perf-sys.h samples/bpf: fix build by setting HAVE_ATTR_TEST to zero Ian Rogers (1): perf tools: Avoid 'sample_reg_masks' being const + weak Jin Yao (1): perf diff: Report noisy for cycles diff Jiri Olsa (27): libperf: Add perf_mmap__init() function libperf: Add 'struct perf_mmap_param' libperf: Adopt perf_mmap__mmap_len() function from tools/perf libperf: Adopt perf_mmap__mmap() function from tools/perf libperf: Adopt perf_mmap__get() function from tools/perf libperf: Adopt perf_mmap__unmap() function from tools/perf libperf: Adopt perf_mmap__put() function from tools/perf perf tools: Use perf_mmap way to detect aux mmap libperf: Adopt perf_mmap__consume() function from tools/perf libperf: Adopt perf_mmap__read_init() from tools/perf libperf: Adopt perf_mmap__read_done() from tools/perf libperf: Adopt perf_mmap__read_event() from tools/perf libperf: Adopt perf_evlist__mmap()/munmap() from tools/perf libperf: Introduce perf_evlist__mmap_ops() libperf: Introduce perf_evlist_mmap_ops::idx callback libperf: Add perf_evlist_mmap_ops::get callback libperf: Introduce perf_evlist_mmap_ops::mmap callback perf tools: Introduce perf_evlist__mmap_cb_idx() perf evlist: Introduce perf_evlist__mmap_cb_get() perf evlist: Introduce perf_evlist__mmap_cb_mmap() perf evlist: Switch to libperf's mmap interface libperf: Centralize map refcnt setting libperf: Move the pollfd allocation from tools/perf to libperf libperf: Introduce perf_evlist__exit() libperf: Introduce perf_evlist__purge() libperf: Adopt perf_evlist__filter_pollfd() from tools/perf perf tools: Propagate CFLAGS to libperf John Garry (1): MAINTAINERS: Add entry for perf tool arm64 pmu-events files MAINTAINERS | 7 + samples/bpf/Makefile | 1 + tools/arch/x86/include/asm/msr-index.h | 857 ++++++++++++ tools/perf/Documentation/perf-config.txt | 5 + tools/perf/Documentation/perf-diff.txt | 5 + tools/perf/Documentation/perf-trace.txt | 10 + tools/perf/Makefile.config | 28 +- tools/perf/Makefile.perf | 11 +- tools/perf/arch/arm/util/Build | 2 + tools/perf/arch/arm/util/perf_regs.c | 6 + tools/perf/arch/arm64/util/Build | 1 + tools/perf/arch/arm64/util/perf_regs.c | 6 + tools/perf/arch/csky/util/Build | 2 + tools/perf/arch/csky/util/perf_regs.c | 6 + tools/perf/arch/riscv/util/Build | 2 + tools/perf/arch/riscv/util/perf_regs.c | 6 + tools/perf/arch/s390/util/Build | 1 + tools/perf/arch/s390/util/perf_regs.c | 6 + tools/perf/arch/x86/tests/perf-time-to-tsc.c | 9 +- tools/perf/builtin-diff.c | 143 ++ tools/perf/builtin-kvm.c | 11 +- tools/perf/builtin-record.c | 10 +- tools/perf/builtin-script.c | 5 - tools/perf/builtin-top.c | 20 +- tools/perf/builtin-trace.c | 593 +++++++-- tools/perf/check-headers.sh | 1 + tools/perf/lib/Build | 1 + tools/perf/lib/Makefile | 5 +- tools/perf/lib/core.c | 3 +- tools/perf/lib/evlist.c | 324 +++++ tools/perf/lib/include/internal/evlist.h | 40 + tools/perf/lib/include/internal/mmap.h | 44 +- tools/perf/lib/include/perf/core.h | 2 + tools/perf/lib/include/perf/evlist.h | 5 + tools/perf/lib/include/perf/mmap.h | 15 + tools/perf/lib/internal.h | 2 + tools/perf/lib/libperf.map | 7 + tools/perf/lib/mmap.c | 273 ++++ tools/perf/perf-sys.h | 6 +- tools/perf/scripts/python/exported-sql-viewer.py | 1555 +++++++++++++++++++++- tools/perf/tests/backward-ring-buffer.c | 7 +- tools/perf/tests/bpf.c | 7 +- tools/perf/tests/code-reading.c | 9 +- tools/perf/tests/keep-tracking.c | 9 +- tools/perf/tests/mmap-basic.c | 9 +- tools/perf/tests/openat-syscall-tp-fields.c | 9 +- tools/perf/tests/perf-record.c | 9 +- tools/perf/tests/sw-clock.c | 9 +- tools/perf/tests/switch-tracking.c | 9 +- tools/perf/tests/task-exit.c | 9 +- tools/perf/trace/beauty/Build | 1 + tools/perf/trace/beauty/beauty.h | 16 +- tools/perf/trace/beauty/tracepoints/Build | 1 + tools/perf/trace/beauty/tracepoints/x86_msr.c | 39 + tools/perf/trace/beauty/tracepoints/x86_msr.sh | 40 + tools/perf/util/Build | 1 + tools/perf/util/annotate.c | 4 + tools/perf/util/annotate.h | 2 + tools/perf/util/env.c | 16 + tools/perf/util/env.h | 1 + tools/perf/util/evlist.c | 322 ++--- tools/perf/util/evlist.h | 12 + tools/perf/util/mmap.c | 260 +--- tools/perf/util/mmap.h | 28 +- tools/perf/util/parse-regs-options.c | 8 +- tools/perf/util/perf_regs.c | 4 - tools/perf/util/perf_regs.h | 4 +- tools/perf/util/python.c | 7 +- tools/perf/util/session.c | 29 - tools/perf/util/session.h | 6 +- tools/perf/util/sort.h | 4 + tools/perf/util/spark.c | 34 + tools/perf/util/spark.h | 8 + tools/perf/util/symbol.h | 2 + 74 files changed, 4266 insertions(+), 705 deletions(-) create mode 100644 tools/arch/x86/include/asm/msr-index.h create mode 100644 tools/perf/arch/arm/util/perf_regs.c create mode 100644 tools/perf/arch/arm64/util/perf_regs.c create mode 100644 tools/perf/arch/csky/util/perf_regs.c create mode 100644 tools/perf/arch/riscv/util/perf_regs.c create mode 100644 tools/perf/arch/s390/util/perf_regs.c create mode 100644 tools/perf/lib/include/perf/mmap.h create mode 100644 tools/perf/lib/mmap.c create mode 100644 tools/perf/trace/beauty/tracepoints/Build create mode 100644 tools/perf/trace/beauty/tracepoints/x86_msr.c create mode 100755 tools/perf/trace/beauty/tracepoints/x86_msr.sh create mode 100644 tools/perf/util/spark.c create mode 100644 tools/perf/util/spark.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.4.0-rc2.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 centos:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), clang version 7.0.1 (tags/RELEASE_701/final) 17 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20190930 gcc-9-branch@276275, clang version 8.0.0 (tags/RELEASE_800/final) 18 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 19 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 20 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 21 debian:experimental : Ok gcc (Debian 9.2.1-8) 9.2.1 20190909, clang version 8.0.1-3+b1 (tags/RELEASE_801/final) 22 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 24 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-19) 8.3.0 25 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 26 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 27 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 28 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 29 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 30 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 31 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 32 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 33 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 34 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 35 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 36 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 37 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 39 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc31) 40 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-1.fc32) 41 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 42 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 43 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 44 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 45 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 8.0.1 (tags/RELEASE_801/final) 46 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 47 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 7.0.1 (tags/RELEASE_701/final 349238) 48 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 49 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330], clang version 8.0.1 (tags/RELEASE_801/final 366581) 50 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 51 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 52 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 53 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 54 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 55 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 56 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 63 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 73 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 74 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 75 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 78 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-8ubuntu1) 9.2.1 20190909, clang version 9.0.0-+rc5-1~exp1 (tags/RELEASE_900/rc5) # # uname -a Linux quaco 5.2.17-200.fc30.x86_64 #1 SMP Mon Sep 23 13:42:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 cebf7d51a6c3 perf diff: Report noisy for cycles diff # perf version --build-options perf version 5.4.rc2.g32fdc2ca7e2a dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_debug_O: make DEBUG=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_newt_O: make NO_NEWT=1 make_help_O: make help make_no_libbpf_O: make NO_LIBBPF=1 make_no_demangle_O: make NO_DEMANGLE=1 make_perf_o_O: make perf.o make_pure_O: make make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_libelf_O: make NO_LIBELF=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_util_map_o_O: make util/map.o make_no_auxtrace_O: make NO_AUXTRACE=1 make_cscope_O: make cscope make_doc_O: make doc make_install_bin_O: make install-bin make_no_libperl_O: make NO_LIBPERL=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_install_O: make install make_tags_O: make tags make_no_slang_O: make NO_SLANG=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_gtk2_O: make NO_GTK2=1 make_clean_all_O: make clean all make_with_babeltrace_O: make LIBBABELTRACE=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_libaudit_O: make NO_LIBAUDIT=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-10-11 20:04 Arnaldo Carvalho de Melo @ 2019-10-15 5:25 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-10-15 5:25 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Andi Kleen, Björn Töpel, Ian Rogers, Jin Yao, John Garry, KP Singh, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit f733c6b508bcaa3441ba1eacf16efb9abd47489f: > > perf/core: Fix inheritance of aux_output groups (2019-10-07 16:50:42 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20191011 > > for you to fetch changes up to cebf7d51a6c3babc4d0589da7aec0de1af0a5691: > > perf diff: Report noisy for cycles diff (2019-10-11 10:57:00 -0300) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-09-26 0:31 Arnaldo Carvalho de Melo 2019-09-26 5:55 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-09-26 0:31 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Andi Kleen, Andreas Krebbel, Kim Phillips, Mamatha Inamdar, Stephane Eranian, Steven Rostedt, Thomas Richter, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 2b32769700f857a8e608a8ee24080833889965b9: Merge tag 'perf-urgent-for-mingo-5.4-20190921' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2019-09-22 12:45:11 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20190925 for you to fetch changes up to d6840d87b2d148e19e244ad2b44d28ba07f437a0: perf parser: Remove needless include directives (2019-09-25 16:26:41 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf record: Stephane Eranian: - Fix priv level with branch sampling for paranoid=2, i.e. the kernel checks if perf_event_attr_attr.exclude_hv is set in addition to .exclude_kernel, so reset both to zero. Arnaldo Carvalho de Melo: - Don't warn about not being able to read kernel maps (kallsyms, etc) when kernel samples aren't being collected. perf list: Kim Phillips: - Allow plurals for metric, metricgroup., i.e.: $ perf list metrics was showing nothing, which is very confusing, make it work like: $ perf stat metric perf stat: Andi Kleen: - Free memory access/leaks detected via valgrind, related to metrics. Libraries: libperf: Jiri Olsa: - Move more stuff from tools/perf, this time a first stab at moving perf_mmap methods. libtracevent: Steven Rostedt (VMware): - Round up in tep_print_event() time precision. Tzvetomir Stoyanov (VMware): - Man pages for event print and related and plugins APIs. - Move traceevent plugins in its own subdirectory. Feature detection: Thomas Richter: - Add detection of java-11-openjdk-devel package, in addition to the older versions supported. Architecture specific: S/390: Thomas Richter (2): - Include JVMTI support for s390 Vendor events: AMD: Kim Phillips: - Add L3 cache events for Family 17h. - Remove redundant '['. PowerPC: Mamatha Inamdar: - Remove P8 HW events which are not supported. Cleanups: Arnaldo Carvalho de Melo: - Remove needless headers, add needed ones, move things around to reduce the headers dependency tree, speeding up builds by not doing needless compiles when unrelated stuff gets changed. - Ditch unused code that was dragging headers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Andi Kleen (2): perf stat: Fix free memory access / memory leaks in metrics perf evlist: Fix access of freed id arrays Arnaldo Carvalho de Melo (12): perf record: Move restricted maps check to after a possible fallback to not collect kernel samples perf evlist: Adopt backwards ring buffer state enum libperf: Add missing 'struct xyarray' forward declaration perf tools: No need to include internal/lib.h from util/util.h libperf: Use sys/types.h to get ssize_t, not unistd.h perf copyfile: Move copyfile routines to separate files perf evsel: Remove need for symbol_conf in evsel_fprintf.c perf evsel: Introduce evsel_fprintf.h perf evlist: Remove unused perf_evlist__fprintf() method perf evsel: Move config terms to a separate header perf tools: Replace needless mmap.h with what is needed, event.h perf parser: Remove needless include directives Jiri Olsa (37): tools: Add missing stdio.h include to asm/bug.h header perf tools: Rename 'struct perf_mmap' to 'struct mmap' perf tools: Rename perf_evlist__mmap() to evlist__mmap() perf tools: Rename perf_evlist__munmap() to evlist__munmap() perf tools: Rename perf_evlist__alloc_mmap() to evlist__alloc_mmap() perf tools: Rename perf_evlist__exit() to evlist__exit() perf tools: Rename perf_evlist__purge() to evlist__purge() libperf: Link libapi.a in libperf.so libperf: Add perf_mmap struct libperf: Add 'mask' to struct perf_mmap libperf: Add 'fd' to struct perf_mmap libperf: Add 'cpu' to struct perf_mmap libperf: Add 'refcnt' to struct perf_mmap libperf: Add prev/start/end to struct perf_mmap libperf: Add 'overwrite' to 'struct perf_mmap' libperf: Add 'event_copy' to 'struct perf_mmap' libperf: Add 'flush' to 'struct perf_mmap' libperf: Move 'system_wide' from 'struct evsel' to 'struct perf_evsel' libperf: Move 'nr_mmaps' from 'struct evlist' to 'struct perf_evlist' libperf: Move 'mmap_len' from 'struct evlist' to 'struct perf_evlist' libperf: Move 'pollfd' from 'struct evlist' to 'struct perf_evlist' libperf: Move 'sample_id' from 'struct evsel' to 'struct perf_evsel' libperf: Move 'id' from 'struct evsel' to 'struct perf_evsel' libperf: Move 'ids' from 'struct evsel' to 'struct perf_evsel' libperf: Move 'heads' from 'struct evlist' to 'struct perf_evlist' libperf: Add perf_evsel__alloc_id/perf_evsel__free_id functions libperf: Add perf_evlist__first()/last() functions libperf: Add perf_evlist__read_format() function libperf: Add perf_evlist__id_add() function libperf: Add perf_evlist__id_add_fd() function libperf: Move 'page_size' global variable to libperf libperf: Add libperf dependency for tests targets libperf: Merge libperf_set_print() into libperf_init() libperf: Add libperf_init() call to the tests libperf: Add perf_evlist__alloc_pollfd() function libperf: Add perf_evlist__add_pollfd() function libperf: Add perf_evlist__poll() function Kim Phillips (4): perf vendor events amd: Add L3 cache events for Family 17h perf vendor events amd: Remove redundant '[' perf vendor events: Minor fixes to the README perf list: Allow plurals for metric, metricgroup Mamatha Inamdar (1): perf vendor events: Remove P8 HW events which are not supported Stephane Eranian (1): perf record: Fix priv level with branch sampling for paranoid=2 Steven Rostedt (VMware) (1): libtraceevent: Round up in tep_print_event() time precision Thomas Richter (2): perf jvmti: Include JVMTI support for s390 perf build: Add detection of java-11-openjdk-devel package Tzvetomir Stoyanov (2): libtraceevent: Man pages for libtraceevent event print related API libtraceevent: Man pages for tep plugins APIs Tzvetomir Stoyanov (VMware) (4): libtraceevent: Man pages fix, rename tep_ref_get() to tep_get_ref() libtraceevent: Man pages fix, changes in event printing APIs libtraceevent: Add tep_get_event() in event-parse.h libtraceevent: Move traceevent plugins in its own subdirectory tools/include/asm/bug.h | 1 + tools/lib/traceevent/Build | 11 - .../Documentation/libtraceevent-event_print.txt | 130 +++++++++ .../Documentation/libtraceevent-handle.txt | 8 +- .../Documentation/libtraceevent-plugins.txt | 99 +++++++ .../lib/traceevent/Documentation/libtraceevent.txt | 15 +- tools/lib/traceevent/Makefile | 94 ++----- tools/lib/traceevent/event-parse.c | 4 +- tools/lib/traceevent/event-parse.h | 2 + tools/lib/traceevent/plugins/Build | 10 + tools/lib/traceevent/plugins/Makefile | 222 ++++++++++++++++ .../lib/traceevent/{ => plugins}/plugin_cfg80211.c | 0 .../lib/traceevent/{ => plugins}/plugin_function.c | 0 .../lib/traceevent/{ => plugins}/plugin_hrtimer.c | 0 tools/lib/traceevent/{ => plugins}/plugin_jbd2.c | 0 tools/lib/traceevent/{ => plugins}/plugin_kmem.c | 0 tools/lib/traceevent/{ => plugins}/plugin_kvm.c | 0 .../lib/traceevent/{ => plugins}/plugin_mac80211.c | 0 .../traceevent/{ => plugins}/plugin_sched_switch.c | 0 tools/lib/traceevent/{ => plugins}/plugin_scsi.c | 0 tools/lib/traceevent/{ => plugins}/plugin_xen.c | 0 tools/perf/Makefile.config | 2 +- tools/perf/Makefile.perf | 4 +- tools/perf/arch/arm/util/cs-etm.c | 7 +- tools/perf/arch/arm64/util/arm-spe.c | 6 +- tools/perf/arch/s390/Makefile | 1 + tools/perf/arch/s390/util/auxtrace.c | 1 + tools/perf/arch/s390/util/machine.c | 2 +- tools/perf/arch/x86/tests/intel-cqm.c | 5 +- tools/perf/arch/x86/tests/perf-time-to-tsc.c | 11 +- tools/perf/arch/x86/tests/rdpmc.c | 2 +- tools/perf/arch/x86/util/intel-bts.c | 9 +- tools/perf/arch/x86/util/intel-pt.c | 17 +- tools/perf/arch/x86/util/machine.c | 2 +- tools/perf/builtin-evlist.c | 1 + tools/perf/builtin-kvm.c | 13 +- tools/perf/builtin-list.c | 4 +- tools/perf/builtin-record.c | 102 +++---- tools/perf/builtin-sched.c | 3 +- tools/perf/builtin-script.c | 11 +- tools/perf/builtin-stat.c | 6 +- tools/perf/builtin-top.c | 22 +- tools/perf/builtin-trace.c | 17 +- tools/perf/lib/Makefile | 35 ++- tools/perf/lib/core.c | 13 +- tools/perf/lib/evlist.c | 124 +++++++++ tools/perf/lib/evsel.c | 30 +++ tools/perf/lib/include/internal/evlist.h | 33 +++ tools/perf/lib/include/internal/evsel.h | 33 +++ tools/perf/lib/include/internal/lib.h | 4 +- tools/perf/lib/include/internal/mmap.h | 32 +++ tools/perf/lib/include/perf/core.h | 2 +- tools/perf/lib/include/perf/evlist.h | 1 + tools/perf/lib/lib.c | 2 + tools/perf/lib/libperf.map | 3 +- tools/perf/lib/tests/test-cpumap.c | 10 + tools/perf/lib/tests/test-evlist.c | 10 + tools/perf/lib/tests/test-evsel.c | 10 + tools/perf/lib/tests/test-threadmap.c | 10 + tools/perf/perf.c | 13 +- tools/perf/pmu-events/README | 22 +- .../perf/pmu-events/arch/powerpc/power8/other.json | 24 -- .../perf/pmu-events/arch/x86/amdfam17h/cache.json | 42 +++ tools/perf/pmu-events/arch/x86/amdfam17h/core.json | 2 +- tools/perf/pmu-events/jevents.c | 1 + tools/perf/tests/backward-ring-buffer.c | 11 +- tools/perf/tests/bpf.c | 9 +- tools/perf/tests/code-reading.c | 11 +- tools/perf/tests/event-times.c | 14 +- tools/perf/tests/event_update.c | 6 +- tools/perf/tests/evsel-roundtrip-name.c | 2 +- tools/perf/tests/hists_cumulate.c | 2 +- tools/perf/tests/hists_link.c | 5 +- tools/perf/tests/hists_output.c | 2 +- tools/perf/tests/keep-tracking.c | 11 +- tools/perf/tests/mmap-basic.c | 5 +- tools/perf/tests/mmap-thread-lookup.c | 2 +- tools/perf/tests/openat-syscall-tp-fields.c | 11 +- tools/perf/tests/parse-events.c | 116 ++++---- tools/perf/tests/perf-record.c | 13 +- tools/perf/tests/sdt.c | 1 + tools/perf/tests/sw-clock.c | 5 +- tools/perf/tests/switch-tracking.c | 29 +- tools/perf/tests/task-exit.c | 9 +- tools/perf/tests/vmlinux-kallsyms.c | 2 +- tools/perf/ui/browsers/hists.c | 6 +- tools/perf/ui/gtk/hists.c | 1 + tools/perf/util/Build | 2 + tools/perf/util/annotate.c | 1 + tools/perf/util/auxtrace.c | 8 +- tools/perf/util/auxtrace.h | 8 +- tools/perf/util/bpf-loader.c | 2 +- tools/perf/util/build-id.c | 3 +- tools/perf/util/copyfile.c | 144 ++++++++++ tools/perf/util/copyfile.h | 16 ++ tools/perf/util/cs-etm.c | 2 +- tools/perf/util/evlist.c | 295 ++++++--------------- tools/perf/util/evlist.h | 81 +++--- tools/perf/util/evsel.c | 204 ++------------ tools/perf/util/evsel.h | 121 +-------- tools/perf/util/evsel_config.h | 50 ++++ tools/perf/util/evsel_fprintf.c | 15 +- tools/perf/util/evsel_fprintf.h | 50 ++++ tools/perf/util/genelf.h | 3 + tools/perf/util/header.c | 29 +- tools/perf/util/intel-bts.c | 4 +- tools/perf/util/intel-pt.c | 10 +- tools/perf/util/jitdump.c | 2 +- tools/perf/util/machine.c | 1 + tools/perf/util/mmap.c | 185 ++++++------- tools/perf/util/mmap.h | 77 ++---- tools/perf/util/parse-events.c | 8 +- tools/perf/util/parse-events.y | 4 +- tools/perf/util/perf_event_attr_fprintf.c | 148 +++++++++++ tools/perf/util/python-ext-sources | 1 + tools/perf/util/python.c | 24 +- tools/perf/util/record.c | 6 +- tools/perf/util/session.c | 5 +- tools/perf/util/sort.c | 2 +- tools/perf/util/srccode.c | 2 +- tools/perf/util/stat-shadow.c | 4 +- tools/perf/util/stat.c | 2 +- tools/perf/util/symbol-elf.c | 2 +- tools/perf/util/synthetic-events.c | 20 +- tools/perf/util/top.c | 2 +- tools/perf/util/trace-event-info.c | 2 +- tools/perf/util/util.c | 136 ---------- tools/perf/util/util.h | 8 - 128 files changed, 1941 insertions(+), 1321 deletions(-) create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_print.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-plugins.txt create mode 100644 tools/lib/traceevent/plugins/Build create mode 100644 tools/lib/traceevent/plugins/Makefile rename tools/lib/traceevent/{ => plugins}/plugin_cfg80211.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_function.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_hrtimer.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_jbd2.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_kmem.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_kvm.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_mac80211.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_sched_switch.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_scsi.c (100%) rename tools/lib/traceevent/{ => plugins}/plugin_xen.c (100%) create mode 100644 tools/perf/lib/include/internal/mmap.h create mode 100644 tools/perf/util/copyfile.c create mode 100644 tools/perf/util/copyfile.h create mode 100644 tools/perf/util/evsel_config.h create mode 100644 tools/perf/util/evsel_fprintf.h create mode 100644 tools/perf/util/perf_event_attr_fprintf.c Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20190908 gcc-9-branch@275492, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 9.2.1-8) 9.2.1 20190909, clang version 8.0.1-3+b1 (tags/RELEASE_801/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-19) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.0 (Fedora 9.0.0-0.2.rc3.fc31) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190820 [gcc-9-branch revision 274748], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-8ubuntu1) 9.2.1 20190909, clang version 9.0.0-+rc5-1~exp1 (tags/RELEASE_900/rc5) # # uname -a Linux quaco 5.2.17-200.fc30.x86_64 #1 SMP Mon Sep 23 13:42:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 d6840d87b2d1 perf parser: Remove needless include directives # perf version --build-options perf version 5.3.gd6840d87b2d1 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test | tee /wb/build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_libbpf_O: make NO_LIBBPF=1 make_install_O: make install make_cscope_O: make cscope make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libelf_O: make NO_LIBELF=1 make_perf_o_O: make perf.o make_no_libpython_O: make NO_LIBPYTHON=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_slang_O: make NO_SLANG=1 make_no_gtk2_O: make NO_GTK2=1 make_tags_O: make tags make_pure_O: make make_util_map_o_O: make util/map.o make_help_O: make help make_no_libnuma_O: make NO_LIBNUMA=1 make_install_prefix_O: make install prefix=/tmp/krava make_with_babeltrace_O: make LIBBABELTRACE=1 make_clean_all_O: make clean all make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_newt_O: make NO_NEWT=1 make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_doc_O: make doc make_no_libperl_O: make NO_LIBPERL=1 make_install_bin_O: make install-bin make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_debug_O: make DEBUG=1 make_no_libaudit_O: make NO_LIBAUDIT=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-09-26 0:31 Arnaldo Carvalho de Melo @ 2019-09-26 5:55 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-09-26 5:55 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Andi Kleen, Andreas Krebbel, Kim Phillips, Mamatha Inamdar, Stephane Eranian, Steven Rostedt, Thomas Richter, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 2b32769700f857a8e608a8ee24080833889965b9: > > Merge tag 'perf-urgent-for-mingo-5.4-20190921' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2019-09-22 12:45:11 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.5-20190925 > > for you to fetch changes up to d6840d87b2d148e19e244ad2b44d28ba07f437a0: > > perf parser: Remove needless include directives (2019-09-25 16:26:41 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf record: > > Stephane Eranian: > > - Fix priv level with branch sampling for paranoid=2, i.e. the kernel checks > if perf_event_attr_attr.exclude_hv is set in addition to .exclude_kernel, > so reset both to zero. > > Arnaldo Carvalho de Melo: > > - Don't warn about not being able to read kernel maps (kallsyms, etc) when > kernel samples aren't being collected. > > perf list: > > Kim Phillips: > > - Allow plurals for metric, metricgroup., i.e.: > > $ perf list metrics > > was showing nothing, which is very confusing, make it work like: > > $ perf stat metric > > perf stat: > > Andi Kleen: > > - Free memory access/leaks detected via valgrind, related to metrics. > > Libraries: > > libperf: > > Jiri Olsa: > > - Move more stuff from tools/perf, this time a first stab at moving perf_mmap > methods. > > libtracevent: > > Steven Rostedt (VMware): > > - Round up in tep_print_event() time precision. > > Tzvetomir Stoyanov (VMware): > > - Man pages for event print and related and plugins APIs. > > - Move traceevent plugins in its own subdirectory. > > Feature detection: > > Thomas Richter: > > - Add detection of java-11-openjdk-devel package, in addition to the older > versions supported. > > Architecture specific: > > S/390: > > Thomas Richter (2): > > - Include JVMTI support for s390 > > Vendor events: > > AMD: > > Kim Phillips: > > - Add L3 cache events for Family 17h. > > - Remove redundant '['. > > PowerPC: > > Mamatha Inamdar: > > - Remove P8 HW events which are not supported. > > Cleanups: > > Arnaldo Carvalho de Melo: > > - Remove needless headers, add needed ones, move things around to reduce the > headers dependency tree, speeding up builds by not doing needless compiles > when unrelated stuff gets changed. > > - Ditch unused code that was dragging headers. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Andi Kleen (2): > perf stat: Fix free memory access / memory leaks in metrics > perf evlist: Fix access of freed id arrays > > Arnaldo Carvalho de Melo (12): > perf record: Move restricted maps check to after a possible fallback to not collect kernel samples > perf evlist: Adopt backwards ring buffer state enum > libperf: Add missing 'struct xyarray' forward declaration > perf tools: No need to include internal/lib.h from util/util.h > libperf: Use sys/types.h to get ssize_t, not unistd.h > perf copyfile: Move copyfile routines to separate files > perf evsel: Remove need for symbol_conf in evsel_fprintf.c > perf evsel: Introduce evsel_fprintf.h > perf evlist: Remove unused perf_evlist__fprintf() method > perf evsel: Move config terms to a separate header > perf tools: Replace needless mmap.h with what is needed, event.h > perf parser: Remove needless include directives > > Jiri Olsa (37): > tools: Add missing stdio.h include to asm/bug.h header > perf tools: Rename 'struct perf_mmap' to 'struct mmap' > perf tools: Rename perf_evlist__mmap() to evlist__mmap() > perf tools: Rename perf_evlist__munmap() to evlist__munmap() > perf tools: Rename perf_evlist__alloc_mmap() to evlist__alloc_mmap() > perf tools: Rename perf_evlist__exit() to evlist__exit() > perf tools: Rename perf_evlist__purge() to evlist__purge() > libperf: Link libapi.a in libperf.so > libperf: Add perf_mmap struct > libperf: Add 'mask' to struct perf_mmap > libperf: Add 'fd' to struct perf_mmap > libperf: Add 'cpu' to struct perf_mmap > libperf: Add 'refcnt' to struct perf_mmap > libperf: Add prev/start/end to struct perf_mmap > libperf: Add 'overwrite' to 'struct perf_mmap' > libperf: Add 'event_copy' to 'struct perf_mmap' > libperf: Add 'flush' to 'struct perf_mmap' > libperf: Move 'system_wide' from 'struct evsel' to 'struct perf_evsel' > libperf: Move 'nr_mmaps' from 'struct evlist' to 'struct perf_evlist' > libperf: Move 'mmap_len' from 'struct evlist' to 'struct perf_evlist' > libperf: Move 'pollfd' from 'struct evlist' to 'struct perf_evlist' > libperf: Move 'sample_id' from 'struct evsel' to 'struct perf_evsel' > libperf: Move 'id' from 'struct evsel' to 'struct perf_evsel' > libperf: Move 'ids' from 'struct evsel' to 'struct perf_evsel' > libperf: Move 'heads' from 'struct evlist' to 'struct perf_evlist' > libperf: Add perf_evsel__alloc_id/perf_evsel__free_id functions > libperf: Add perf_evlist__first()/last() functions > libperf: Add perf_evlist__read_format() function > libperf: Add perf_evlist__id_add() function > libperf: Add perf_evlist__id_add_fd() function > libperf: Move 'page_size' global variable to libperf > libperf: Add libperf dependency for tests targets > libperf: Merge libperf_set_print() into libperf_init() > libperf: Add libperf_init() call to the tests > libperf: Add perf_evlist__alloc_pollfd() function > libperf: Add perf_evlist__add_pollfd() function > libperf: Add perf_evlist__poll() function > > Kim Phillips (4): > perf vendor events amd: Add L3 cache events for Family 17h > perf vendor events amd: Remove redundant '[' > perf vendor events: Minor fixes to the README > perf list: Allow plurals for metric, metricgroup > > Mamatha Inamdar (1): > perf vendor events: Remove P8 HW events which are not supported > > Stephane Eranian (1): > perf record: Fix priv level with branch sampling for paranoid=2 > > Steven Rostedt (VMware) (1): > libtraceevent: Round up in tep_print_event() time precision > > Thomas Richter (2): > perf jvmti: Include JVMTI support for s390 > perf build: Add detection of java-11-openjdk-devel package > > Tzvetomir Stoyanov (2): > libtraceevent: Man pages for libtraceevent event print related API > libtraceevent: Man pages for tep plugins APIs > > Tzvetomir Stoyanov (VMware) (4): > libtraceevent: Man pages fix, rename tep_ref_get() to tep_get_ref() > libtraceevent: Man pages fix, changes in event printing APIs > libtraceevent: Add tep_get_event() in event-parse.h > libtraceevent: Move traceevent plugins in its own subdirectory > > tools/include/asm/bug.h | 1 + > tools/lib/traceevent/Build | 11 - > .../Documentation/libtraceevent-event_print.txt | 130 +++++++++ > .../Documentation/libtraceevent-handle.txt | 8 +- > .../Documentation/libtraceevent-plugins.txt | 99 +++++++ > .../lib/traceevent/Documentation/libtraceevent.txt | 15 +- > tools/lib/traceevent/Makefile | 94 ++----- > tools/lib/traceevent/event-parse.c | 4 +- > tools/lib/traceevent/event-parse.h | 2 + > tools/lib/traceevent/plugins/Build | 10 + > tools/lib/traceevent/plugins/Makefile | 222 ++++++++++++++++ > .../lib/traceevent/{ => plugins}/plugin_cfg80211.c | 0 > .../lib/traceevent/{ => plugins}/plugin_function.c | 0 > .../lib/traceevent/{ => plugins}/plugin_hrtimer.c | 0 > tools/lib/traceevent/{ => plugins}/plugin_jbd2.c | 0 > tools/lib/traceevent/{ => plugins}/plugin_kmem.c | 0 > tools/lib/traceevent/{ => plugins}/plugin_kvm.c | 0 > .../lib/traceevent/{ => plugins}/plugin_mac80211.c | 0 > .../traceevent/{ => plugins}/plugin_sched_switch.c | 0 > tools/lib/traceevent/{ => plugins}/plugin_scsi.c | 0 > tools/lib/traceevent/{ => plugins}/plugin_xen.c | 0 > tools/perf/Makefile.config | 2 +- > tools/perf/Makefile.perf | 4 +- > tools/perf/arch/arm/util/cs-etm.c | 7 +- > tools/perf/arch/arm64/util/arm-spe.c | 6 +- > tools/perf/arch/s390/Makefile | 1 + > tools/perf/arch/s390/util/auxtrace.c | 1 + > tools/perf/arch/s390/util/machine.c | 2 +- > tools/perf/arch/x86/tests/intel-cqm.c | 5 +- > tools/perf/arch/x86/tests/perf-time-to-tsc.c | 11 +- > tools/perf/arch/x86/tests/rdpmc.c | 2 +- > tools/perf/arch/x86/util/intel-bts.c | 9 +- > tools/perf/arch/x86/util/intel-pt.c | 17 +- > tools/perf/arch/x86/util/machine.c | 2 +- > tools/perf/builtin-evlist.c | 1 + > tools/perf/builtin-kvm.c | 13 +- > tools/perf/builtin-list.c | 4 +- > tools/perf/builtin-record.c | 102 +++---- > tools/perf/builtin-sched.c | 3 +- > tools/perf/builtin-script.c | 11 +- > tools/perf/builtin-stat.c | 6 +- > tools/perf/builtin-top.c | 22 +- > tools/perf/builtin-trace.c | 17 +- > tools/perf/lib/Makefile | 35 ++- > tools/perf/lib/core.c | 13 +- > tools/perf/lib/evlist.c | 124 +++++++++ > tools/perf/lib/evsel.c | 30 +++ > tools/perf/lib/include/internal/evlist.h | 33 +++ > tools/perf/lib/include/internal/evsel.h | 33 +++ > tools/perf/lib/include/internal/lib.h | 4 +- > tools/perf/lib/include/internal/mmap.h | 32 +++ > tools/perf/lib/include/perf/core.h | 2 +- > tools/perf/lib/include/perf/evlist.h | 1 + > tools/perf/lib/lib.c | 2 + > tools/perf/lib/libperf.map | 3 +- > tools/perf/lib/tests/test-cpumap.c | 10 + > tools/perf/lib/tests/test-evlist.c | 10 + > tools/perf/lib/tests/test-evsel.c | 10 + > tools/perf/lib/tests/test-threadmap.c | 10 + > tools/perf/perf.c | 13 +- > tools/perf/pmu-events/README | 22 +- > .../perf/pmu-events/arch/powerpc/power8/other.json | 24 -- > .../perf/pmu-events/arch/x86/amdfam17h/cache.json | 42 +++ > tools/perf/pmu-events/arch/x86/amdfam17h/core.json | 2 +- > tools/perf/pmu-events/jevents.c | 1 + > tools/perf/tests/backward-ring-buffer.c | 11 +- > tools/perf/tests/bpf.c | 9 +- > tools/perf/tests/code-reading.c | 11 +- > tools/perf/tests/event-times.c | 14 +- > tools/perf/tests/event_update.c | 6 +- > tools/perf/tests/evsel-roundtrip-name.c | 2 +- > tools/perf/tests/hists_cumulate.c | 2 +- > tools/perf/tests/hists_link.c | 5 +- > tools/perf/tests/hists_output.c | 2 +- > tools/perf/tests/keep-tracking.c | 11 +- > tools/perf/tests/mmap-basic.c | 5 +- > tools/perf/tests/mmap-thread-lookup.c | 2 +- > tools/perf/tests/openat-syscall-tp-fields.c | 11 +- > tools/perf/tests/parse-events.c | 116 ++++---- > tools/perf/tests/perf-record.c | 13 +- > tools/perf/tests/sdt.c | 1 + > tools/perf/tests/sw-clock.c | 5 +- > tools/perf/tests/switch-tracking.c | 29 +- > tools/perf/tests/task-exit.c | 9 +- > tools/perf/tests/vmlinux-kallsyms.c | 2 +- > tools/perf/ui/browsers/hists.c | 6 +- > tools/perf/ui/gtk/hists.c | 1 + > tools/perf/util/Build | 2 + > tools/perf/util/annotate.c | 1 + > tools/perf/util/auxtrace.c | 8 +- > tools/perf/util/auxtrace.h | 8 +- > tools/perf/util/bpf-loader.c | 2 +- > tools/perf/util/build-id.c | 3 +- > tools/perf/util/copyfile.c | 144 ++++++++++ > tools/perf/util/copyfile.h | 16 ++ > tools/perf/util/cs-etm.c | 2 +- > tools/perf/util/evlist.c | 295 ++++++--------------- > tools/perf/util/evlist.h | 81 +++--- > tools/perf/util/evsel.c | 204 ++------------ > tools/perf/util/evsel.h | 121 +-------- > tools/perf/util/evsel_config.h | 50 ++++ > tools/perf/util/evsel_fprintf.c | 15 +- > tools/perf/util/evsel_fprintf.h | 50 ++++ > tools/perf/util/genelf.h | 3 + > tools/perf/util/header.c | 29 +- > tools/perf/util/intel-bts.c | 4 +- > tools/perf/util/intel-pt.c | 10 +- > tools/perf/util/jitdump.c | 2 +- > tools/perf/util/machine.c | 1 + > tools/perf/util/mmap.c | 185 ++++++------- > tools/perf/util/mmap.h | 77 ++---- > tools/perf/util/parse-events.c | 8 +- > tools/perf/util/parse-events.y | 4 +- > tools/perf/util/perf_event_attr_fprintf.c | 148 +++++++++++ > tools/perf/util/python-ext-sources | 1 + > tools/perf/util/python.c | 24 +- > tools/perf/util/record.c | 6 +- > tools/perf/util/session.c | 5 +- > tools/perf/util/sort.c | 2 +- > tools/perf/util/srccode.c | 2 +- > tools/perf/util/stat-shadow.c | 4 +- > tools/perf/util/stat.c | 2 +- > tools/perf/util/symbol-elf.c | 2 +- > tools/perf/util/synthetic-events.c | 20 +- > tools/perf/util/top.c | 2 +- > tools/perf/util/trace-event-info.c | 2 +- > tools/perf/util/util.c | 136 ---------- > tools/perf/util/util.h | 8 - > 128 files changed, 1941 insertions(+), 1321 deletions(-) > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_print.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-plugins.txt > create mode 100644 tools/lib/traceevent/plugins/Build > create mode 100644 tools/lib/traceevent/plugins/Makefile > rename tools/lib/traceevent/{ => plugins}/plugin_cfg80211.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_function.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_hrtimer.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_jbd2.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_kmem.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_kvm.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_mac80211.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_sched_switch.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_scsi.c (100%) > rename tools/lib/traceevent/{ => plugins}/plugin_xen.c (100%) > create mode 100644 tools/perf/lib/include/internal/mmap.h > create mode 100644 tools/perf/util/copyfile.c > create mode 100644 tools/perf/util/copyfile.h > create mode 100644 tools/perf/util/evsel_config.h > create mode 100644 tools/perf/util/evsel_fprintf.h > create mode 100644 tools/perf/util/perf_event_attr_fprintf.c Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-09-20 14:25 Arnaldo Carvalho de Melo 2019-09-20 16:15 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-09-20 14:25 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Anju T Sudhakar, Colin King, James Clark, Ravi Bangoria, Sakari Ailus, Srikar Dronamraju, Thomas Richter, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit e336b4027775cb458dc713745e526fa1a1996b2a: kprobes: Prohibit probing on BUG() and WARN() address (2019-09-05 10:15:16 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190920-2 for you to fetch changes up to 2bff2b828502b5e5d5ea5a52643d3542053df03f: perf kvm stat: Set 'trace_cycles' as default event for 'perf kvm record' in powerpc (2019-09-20 10:28:26 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf stat: Srikar Dronamraju: - Fix a segmentation fault when using repeat forever. - Reset previous counts on repeat with interval. aarch64: James Clark: - Add PMU event JSON files for Cortex-A76 and Neoverse N1. PowerPC: Anju T Sudhakar: - Make 'trace_cycles' the default event for 'perf kvm record' in PowerPC. S/390: - Link libjvmti to tools/lib/string.o to have a weak strlcpy() implementation, providing previously unresolved symbol on s/390. perf test: Jiri Olsa: - Add libperf automated tests to 'make -C tools/perf build-test'. Colin Ian King: - Fix spelling mistake. Tree wide: Arnaldo Carvalho de Melo: - Some more header file sanitization. libperf: Jiri Olsa: - Add dependency on libperf for python.so binding. libtraceevent: Sakari Ailus: - Convert remaining %p[fF] users to %p[sS]. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Anju T Sudhakar (3): perf kvm: Move kvm-stat header file from conditional inclusion to common include section perf kvm: Add arch neutral function to choose event for perf kvm record perf kvm stat: Set 'trace_cycles' as default event for 'perf kvm record' in powerpc Arnaldo Carvalho de Melo (19): perf jvmti: Link against tools/lib/string.o to have weak strlcpy() perf tools: Remove needless builtin.h include directives perf debug: No need to include ui/util.h perf tools: Remove debug.h from places where it is not needed perf tools: Remove util.h from where it is not needed perf probe: Add missing build-id.h header. perf symbols: Add missing dso.h header perf env: Remove needless cpumap.h header perf event: Move perf_event__synthesize* to event.h perf stat: Move perf_stat_synthesize_config() to event.h perf callchain: Remove needless event.h include perf python: Remove debug.h perf hist: Add missing 'struct branch_stack' forward declaration perf annotate: Add missing machine.h include directive perf sched: Add missing event.h include directive perf auxtrace: Add missing 'struct perf_sample' forward declaration perf tools: Move event synthesizing routines to separate header perf memswap: Adopt 'struct u64_swap' from evsel.h perf tools: Move event synthesizing routines to separate .c file Colin Ian King (1): perf test: Fix spelling mistake "allos" -> "allocate" James Clark (1): perf tools: Add PMU event JSON files for ARM Cortex-A76 and, Neoverse N1. Jiri Olsa (4): perf python: Add missing python/perf.so dependency for libperf perf tests: Add libperf automated test for 'make -C tools/perf build-test' libperf: Add missing event.h file to install rule libperf: Adopt perf_cpu_map__max() function Sakari Ailus (1): tools lib traceevent: Convert remaining %p[fF] users to %p[sS] Srikar Dronamraju (2): perf stat: Reset previous counts on repeat with interval perf stat: Fix a segmentation fault when using repeat forever .../Documentation/libtraceevent-func_apis.txt | 10 +- tools/lib/traceevent/event-parse.c | 18 +- tools/perf/Makefile.perf | 2 +- tools/perf/arch/arm/util/cs-etm.c | 2 +- tools/perf/arch/arm64/util/arm-spe.c | 2 +- tools/perf/arch/arm64/util/dwarf-regs.c | 1 - tools/perf/arch/arm64/util/header.c | 4 +- tools/perf/arch/arm64/util/unwind-libunwind.c | 2 +- tools/perf/arch/powerpc/util/dwarf-regs.c | 1 - tools/perf/arch/powerpc/util/header.c | 1 - tools/perf/arch/powerpc/util/kvm-stat.c | 45 + tools/perf/arch/powerpc/util/skip-callchain-idx.c | 1 + tools/perf/arch/powerpc/util/sym-handling.c | 1 - tools/perf/arch/s390/util/machine.c | 2 +- tools/perf/arch/x86/tests/intel-cqm.c | 1 - tools/perf/arch/x86/tests/perf-time-to-tsc.c | 1 - tools/perf/arch/x86/tests/rdpmc.c | 2 +- tools/perf/arch/x86/util/archinsn.c | 1 + tools/perf/arch/x86/util/event.c | 2 + tools/perf/arch/x86/util/intel-bts.c | 2 +- tools/perf/arch/x86/util/intel-pt.c | 2 +- tools/perf/arch/x86/util/machine.c | 3 +- tools/perf/arch/x86/util/tsc.c | 2 + tools/perf/bench/epoll-ctl.c | 2 +- tools/perf/bench/epoll-wait.c | 2 +- tools/perf/bench/futex-hash.c | 2 +- tools/perf/bench/futex-lock-pi.c | 2 +- tools/perf/bench/futex-requeue.c | 2 +- tools/perf/bench/futex-wake-parallel.c | 3 +- tools/perf/bench/futex-wake.c | 2 +- tools/perf/bench/numa.c | 1 - tools/perf/bench/sched-messaging.c | 2 - tools/perf/bench/sched-pipe.c | 2 - tools/perf/builtin-annotate.c | 1 + tools/perf/builtin-c2c.c | 1 + tools/perf/builtin-config.c | 1 - tools/perf/builtin-evlist.c | 2 - tools/perf/builtin-inject.c | 1 + tools/perf/builtin-kvm.c | 15 +- tools/perf/builtin-record.c | 10 +- tools/perf/builtin-report.c | 2 +- tools/perf/builtin-sched.c | 3 + tools/perf/builtin-stat.c | 24 +- tools/perf/builtin-top.c | 1 + tools/perf/builtin-trace.c | 1 + tools/perf/jvmti/Build | 9 + tools/perf/lib/Makefile | 1 + tools/perf/lib/cpumap.c | 12 + tools/perf/lib/include/perf/cpumap.h | 1 + tools/perf/lib/libperf.map | 1 + tools/perf/perf.c | 2 +- .../arch/arm64/arm/cortex-a76-n1/branch.json | 14 + .../arch/arm64/arm/cortex-a76-n1/bus.json | 24 + .../arch/arm64/arm/cortex-a76-n1/cache.json | 207 +++ .../arch/arm64/arm/cortex-a76-n1/exception.json | 52 + .../arch/arm64/arm/cortex-a76-n1/instruction.json | 108 ++ .../arch/arm64/arm/cortex-a76-n1/memory.json | 23 + .../arch/arm64/arm/cortex-a76-n1/other.json | 7 + .../arch/arm64/arm/cortex-a76-n1/pipeline.json | 14 + tools/perf/pmu-events/arch/arm64/mapfile.csv | 2 + tools/perf/tests/bitmap.c | 2 +- tools/perf/tests/clang.c | 2 - tools/perf/tests/code-reading.c | 2 +- tools/perf/tests/cpumap.c | 1 + tools/perf/tests/dso-data.c | 1 - tools/perf/tests/dwarf-unwind.c | 1 + tools/perf/tests/event-times.c | 1 - tools/perf/tests/event_update.c | 4 +- tools/perf/tests/hists_common.c | 2 + tools/perf/tests/keep-tracking.c | 3 +- tools/perf/tests/llvm.c | 1 - tools/perf/tests/make | 6 +- tools/perf/tests/mem2node.c | 2 +- tools/perf/tests/mmap-basic.c | 3 +- tools/perf/tests/mmap-thread-lookup.c | 4 +- tools/perf/tests/openat-syscall-all-cpus.c | 5 +- tools/perf/tests/parse-events.c | 1 - tools/perf/tests/parse-no-sample-id-all.c | 2 - tools/perf/tests/perf-hooks.c | 1 - tools/perf/tests/pmu.c | 1 - tools/perf/tests/sample-parsing.c | 2 +- tools/perf/tests/stat.c | 1 + tools/perf/tests/switch-tracking.c | 1 - tools/perf/tests/task-exit.c | 2 +- tools/perf/tests/thread-map.c | 1 + tools/perf/tests/topology.c | 2 +- tools/perf/tests/vmlinux-kallsyms.c | 2 +- tools/perf/ui/browser.c | 1 - tools/perf/ui/browsers/annotate.c | 1 - tools/perf/ui/browsers/header.c | 1 - tools/perf/ui/browsers/map.c | 1 - tools/perf/ui/browsers/res_sample.c | 2 +- tools/perf/ui/browsers/scripts.c | 3 +- tools/perf/ui/gtk/helpline.c | 1 - tools/perf/ui/gtk/progress.c | 1 - tools/perf/ui/gtk/setup.c | 3 +- tools/perf/ui/gtk/util.c | 1 - tools/perf/ui/helpline.c | 2 - tools/perf/ui/hist.c | 1 - tools/perf/ui/setup.c | 2 +- tools/perf/ui/stdio/hist.c | 1 + tools/perf/ui/tui/helpline.c | 1 - tools/perf/ui/tui/setup.c | 2 +- tools/perf/ui/tui/util.c | 1 - tools/perf/util/Build | 1 + tools/perf/util/annotate.c | 2 +- tools/perf/util/arm-spe.c | 1 - tools/perf/util/auxtrace.c | 6 +- tools/perf/util/auxtrace.h | 18 +- tools/perf/util/bpf-event.c | 1 + tools/perf/util/bpf-event.h | 15 +- tools/perf/util/branch.c | 2 - tools/perf/util/branch.h | 9 +- tools/perf/util/build-id.c | 2 +- tools/perf/util/callchain.c | 1 + tools/perf/util/callchain.h | 5 +- tools/perf/util/cloexec.c | 2 +- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 - tools/perf/util/cs-etm.c | 2 +- tools/perf/util/data.c | 3 +- tools/perf/util/debug.c | 1 - tools/perf/util/debug.h | 2 +- tools/perf/util/demangle-java.c | 1 - tools/perf/util/demangle-rust.c | 1 - tools/perf/util/dwarf-regs.c | 1 - tools/perf/util/env.h | 3 +- tools/perf/util/event.c | 1109 +----------- tools/perf/util/event.h | 77 +- tools/perf/util/evlist.c | 2 +- tools/perf/util/evsel.c | 280 +-- tools/perf/util/evsel.h | 5 - tools/perf/util/evsel_fprintf.c | 1 + tools/perf/util/header.c | 395 +--- tools/perf/util/header.h | 60 +- tools/perf/util/hist.h | 1 + tools/perf/util/intel-bts.c | 2 +- tools/perf/util/intel-pt.c | 1 + tools/perf/util/jitdump.c | 2 - tools/perf/util/kvm-stat.h | 4 + tools/perf/util/libunwind/arm64.c | 1 - tools/perf/util/libunwind/x86_32.c | 1 - tools/perf/util/llvm-utils.c | 1 + tools/perf/util/lzma.c | 2 +- tools/perf/util/machine.c | 15 - tools/perf/util/machine.h | 15 - tools/perf/util/memswap.h | 7 + tools/perf/util/namespaces.c | 18 + tools/perf/util/namespaces.h | 2 + tools/perf/util/parse-events.c | 1 - tools/perf/util/perf-hooks.c | 1 - tools/perf/util/pmu.c | 1 - tools/perf/util/probe-file.c | 1 + tools/perf/util/python.c | 4 +- tools/perf/util/record.c | 2 - tools/perf/util/rwsem.c | 1 + tools/perf/util/s390-cpumsf.c | 1 - tools/perf/util/s390-sample-raw.c | 1 - .../util/scripting-engines/trace-event-python.c | 2 - tools/perf/util/session.c | 72 +- tools/perf/util/session.h | 5 - tools/perf/util/srccode.c | 2 +- tools/perf/util/stat.c | 60 +- tools/perf/util/stat.h | 9 +- tools/perf/util/svghelper.c | 2 +- tools/perf/util/symbol-elf.c | 3 + tools/perf/util/symbol-minimal.c | 3 +- tools/perf/util/symbol.c | 2 +- tools/perf/util/synthetic-events.c | 1884 ++++++++++++++++++++ tools/perf/util/synthetic-events.h | 103 ++ tools/perf/util/target.c | 2 - tools/perf/util/top.c | 1 - tools/perf/util/trace-event-info.c | 2 +- tools/perf/util/trace-event-read.c | 1 - tools/perf/util/trace-event.c | 1 - tools/perf/util/tsc.h | 14 +- tools/perf/util/unwind-libdw.c | 1 - tools/perf/util/unwind-libunwind-local.c | 1 - tools/perf/util/usage.c | 1 - tools/perf/util/vdso.c | 2 +- tools/perf/util/zlib.c | 4 +- 180 files changed, 2763 insertions(+), 2256 deletions(-) create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json create mode 100644 tools/perf/util/synthetic-events.c create mode 100644 tools/perf/util/synthetic-events.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0-rc6.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20190908 gcc-9-branch@275492, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 9.2.1-8) 9.2.1 20190909, clang version 8.0.1-3+b1 (tags/RELEASE_801/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190820 [gcc-9-branch revision 274748], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-8ubuntu1) 9.2.1 20190909, clang version 9.0.0-+rc5-1~exp1 (tags/RELEASE_900/rc5) # # uname -a Linux quaco 5.3.0+ #2 SMP Thu Sep 19 16:13:22 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 2bff2b828502 perf kvm stat: Set 'trace_cycles' as default event for 'perf kvm record' in powerpc # perf version --build-options perf version 5.3.rc6.g2bff2b828502 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_install_O: make install make_util_map_o_O: make util/map.o make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libelf_O: make NO_LIBELF=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_perf_o_O: make perf.o make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_clean_all_O: make clean all make_doc_O: make doc make_no_gtk2_O: make NO_GTK2=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_static_O: make LDFLAGS=-static make_help_O: make help make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_pure_O: make make_tags_O: make tags make_no_newt_O: make NO_NEWT=1 make_cscope_O: make cscope make_install_bin_O: make install-bin make_no_libbpf_O: make NO_LIBBPF=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_with_babeltrace_O: make LIBBABELTRACE=1 make_install_prefix_O: make install prefix=/tmp/krava make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_debug_O: make DEBUG=1 make_no_slang_O: make NO_SLANG=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libperl_O: make NO_LIBPERL=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-09-20 14:25 Arnaldo Carvalho de Melo @ 2019-09-20 16:15 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-09-20 16:15 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Anju T Sudhakar, Colin King, James Clark, Ravi Bangoria, Sakari Ailus, Srikar Dronamraju, Thomas Richter, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit e336b4027775cb458dc713745e526fa1a1996b2a: > > kprobes: Prohibit probing on BUG() and WARN() address (2019-09-05 10:15:16 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190920-2 > > for you to fetch changes up to 2bff2b828502b5e5d5ea5a52643d3542053df03f: > > perf kvm stat: Set 'trace_cycles' as default event for 'perf kvm record' in powerpc (2019-09-20 10:28:26 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf stat: > > Srikar Dronamraju: > > - Fix a segmentation fault when using repeat forever. > > - Reset previous counts on repeat with interval. > > aarch64: > > James Clark: > > - Add PMU event JSON files for Cortex-A76 and Neoverse N1. > > PowerPC: > > Anju T Sudhakar: > > - Make 'trace_cycles' the default event for 'perf kvm record' in PowerPC. > > S/390: > > - Link libjvmti to tools/lib/string.o to have a weak strlcpy() > implementation, providing previously unresolved symbol on s/390. > > perf test: > > Jiri Olsa: > > - Add libperf automated tests to 'make -C tools/perf build-test'. > > Colin Ian King: > > - Fix spelling mistake. > > Tree wide: > > Arnaldo Carvalho de Melo: > > - Some more header file sanitization. > > libperf: > > Jiri Olsa: > > - Add dependency on libperf for python.so binding. > > libtraceevent: > > Sakari Ailus: > > - Convert remaining %p[fF] users to %p[sS]. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Anju T Sudhakar (3): > perf kvm: Move kvm-stat header file from conditional inclusion to common include section > perf kvm: Add arch neutral function to choose event for perf kvm record > perf kvm stat: Set 'trace_cycles' as default event for 'perf kvm record' in powerpc > > Arnaldo Carvalho de Melo (19): > perf jvmti: Link against tools/lib/string.o to have weak strlcpy() > perf tools: Remove needless builtin.h include directives > perf debug: No need to include ui/util.h > perf tools: Remove debug.h from places where it is not needed > perf tools: Remove util.h from where it is not needed > perf probe: Add missing build-id.h header. > perf symbols: Add missing dso.h header > perf env: Remove needless cpumap.h header > perf event: Move perf_event__synthesize* to event.h > perf stat: Move perf_stat_synthesize_config() to event.h > perf callchain: Remove needless event.h include > perf python: Remove debug.h > perf hist: Add missing 'struct branch_stack' forward declaration > perf annotate: Add missing machine.h include directive > perf sched: Add missing event.h include directive > perf auxtrace: Add missing 'struct perf_sample' forward declaration > perf tools: Move event synthesizing routines to separate header > perf memswap: Adopt 'struct u64_swap' from evsel.h > perf tools: Move event synthesizing routines to separate .c file > > Colin Ian King (1): > perf test: Fix spelling mistake "allos" -> "allocate" > > James Clark (1): > perf tools: Add PMU event JSON files for ARM Cortex-A76 and, Neoverse N1. > > Jiri Olsa (4): > perf python: Add missing python/perf.so dependency for libperf > perf tests: Add libperf automated test for 'make -C tools/perf build-test' > libperf: Add missing event.h file to install rule > libperf: Adopt perf_cpu_map__max() function > > Sakari Ailus (1): > tools lib traceevent: Convert remaining %p[fF] users to %p[sS] > > Srikar Dronamraju (2): > perf stat: Reset previous counts on repeat with interval > perf stat: Fix a segmentation fault when using repeat forever > > .../Documentation/libtraceevent-func_apis.txt | 10 +- > tools/lib/traceevent/event-parse.c | 18 +- > tools/perf/Makefile.perf | 2 +- > tools/perf/arch/arm/util/cs-etm.c | 2 +- > tools/perf/arch/arm64/util/arm-spe.c | 2 +- > tools/perf/arch/arm64/util/dwarf-regs.c | 1 - > tools/perf/arch/arm64/util/header.c | 4 +- > tools/perf/arch/arm64/util/unwind-libunwind.c | 2 +- > tools/perf/arch/powerpc/util/dwarf-regs.c | 1 - > tools/perf/arch/powerpc/util/header.c | 1 - > tools/perf/arch/powerpc/util/kvm-stat.c | 45 + > tools/perf/arch/powerpc/util/skip-callchain-idx.c | 1 + > tools/perf/arch/powerpc/util/sym-handling.c | 1 - > tools/perf/arch/s390/util/machine.c | 2 +- > tools/perf/arch/x86/tests/intel-cqm.c | 1 - > tools/perf/arch/x86/tests/perf-time-to-tsc.c | 1 - > tools/perf/arch/x86/tests/rdpmc.c | 2 +- > tools/perf/arch/x86/util/archinsn.c | 1 + > tools/perf/arch/x86/util/event.c | 2 + > tools/perf/arch/x86/util/intel-bts.c | 2 +- > tools/perf/arch/x86/util/intel-pt.c | 2 +- > tools/perf/arch/x86/util/machine.c | 3 +- > tools/perf/arch/x86/util/tsc.c | 2 + > tools/perf/bench/epoll-ctl.c | 2 +- > tools/perf/bench/epoll-wait.c | 2 +- > tools/perf/bench/futex-hash.c | 2 +- > tools/perf/bench/futex-lock-pi.c | 2 +- > tools/perf/bench/futex-requeue.c | 2 +- > tools/perf/bench/futex-wake-parallel.c | 3 +- > tools/perf/bench/futex-wake.c | 2 +- > tools/perf/bench/numa.c | 1 - > tools/perf/bench/sched-messaging.c | 2 - > tools/perf/bench/sched-pipe.c | 2 - > tools/perf/builtin-annotate.c | 1 + > tools/perf/builtin-c2c.c | 1 + > tools/perf/builtin-config.c | 1 - > tools/perf/builtin-evlist.c | 2 - > tools/perf/builtin-inject.c | 1 + > tools/perf/builtin-kvm.c | 15 +- > tools/perf/builtin-record.c | 10 +- > tools/perf/builtin-report.c | 2 +- > tools/perf/builtin-sched.c | 3 + > tools/perf/builtin-stat.c | 24 +- > tools/perf/builtin-top.c | 1 + > tools/perf/builtin-trace.c | 1 + > tools/perf/jvmti/Build | 9 + > tools/perf/lib/Makefile | 1 + > tools/perf/lib/cpumap.c | 12 + > tools/perf/lib/include/perf/cpumap.h | 1 + > tools/perf/lib/libperf.map | 1 + > tools/perf/perf.c | 2 +- > .../arch/arm64/arm/cortex-a76-n1/branch.json | 14 + > .../arch/arm64/arm/cortex-a76-n1/bus.json | 24 + > .../arch/arm64/arm/cortex-a76-n1/cache.json | 207 +++ > .../arch/arm64/arm/cortex-a76-n1/exception.json | 52 + > .../arch/arm64/arm/cortex-a76-n1/instruction.json | 108 ++ > .../arch/arm64/arm/cortex-a76-n1/memory.json | 23 + > .../arch/arm64/arm/cortex-a76-n1/other.json | 7 + > .../arch/arm64/arm/cortex-a76-n1/pipeline.json | 14 + > tools/perf/pmu-events/arch/arm64/mapfile.csv | 2 + > tools/perf/tests/bitmap.c | 2 +- > tools/perf/tests/clang.c | 2 - > tools/perf/tests/code-reading.c | 2 +- > tools/perf/tests/cpumap.c | 1 + > tools/perf/tests/dso-data.c | 1 - > tools/perf/tests/dwarf-unwind.c | 1 + > tools/perf/tests/event-times.c | 1 - > tools/perf/tests/event_update.c | 4 +- > tools/perf/tests/hists_common.c | 2 + > tools/perf/tests/keep-tracking.c | 3 +- > tools/perf/tests/llvm.c | 1 - > tools/perf/tests/make | 6 +- > tools/perf/tests/mem2node.c | 2 +- > tools/perf/tests/mmap-basic.c | 3 +- > tools/perf/tests/mmap-thread-lookup.c | 4 +- > tools/perf/tests/openat-syscall-all-cpus.c | 5 +- > tools/perf/tests/parse-events.c | 1 - > tools/perf/tests/parse-no-sample-id-all.c | 2 - > tools/perf/tests/perf-hooks.c | 1 - > tools/perf/tests/pmu.c | 1 - > tools/perf/tests/sample-parsing.c | 2 +- > tools/perf/tests/stat.c | 1 + > tools/perf/tests/switch-tracking.c | 1 - > tools/perf/tests/task-exit.c | 2 +- > tools/perf/tests/thread-map.c | 1 + > tools/perf/tests/topology.c | 2 +- > tools/perf/tests/vmlinux-kallsyms.c | 2 +- > tools/perf/ui/browser.c | 1 - > tools/perf/ui/browsers/annotate.c | 1 - > tools/perf/ui/browsers/header.c | 1 - > tools/perf/ui/browsers/map.c | 1 - > tools/perf/ui/browsers/res_sample.c | 2 +- > tools/perf/ui/browsers/scripts.c | 3 +- > tools/perf/ui/gtk/helpline.c | 1 - > tools/perf/ui/gtk/progress.c | 1 - > tools/perf/ui/gtk/setup.c | 3 +- > tools/perf/ui/gtk/util.c | 1 - > tools/perf/ui/helpline.c | 2 - > tools/perf/ui/hist.c | 1 - > tools/perf/ui/setup.c | 2 +- > tools/perf/ui/stdio/hist.c | 1 + > tools/perf/ui/tui/helpline.c | 1 - > tools/perf/ui/tui/setup.c | 2 +- > tools/perf/ui/tui/util.c | 1 - > tools/perf/util/Build | 1 + > tools/perf/util/annotate.c | 2 +- > tools/perf/util/arm-spe.c | 1 - > tools/perf/util/auxtrace.c | 6 +- > tools/perf/util/auxtrace.h | 18 +- > tools/perf/util/bpf-event.c | 1 + > tools/perf/util/bpf-event.h | 15 +- > tools/perf/util/branch.c | 2 - > tools/perf/util/branch.h | 9 +- > tools/perf/util/build-id.c | 2 +- > tools/perf/util/callchain.c | 1 + > tools/perf/util/callchain.h | 5 +- > tools/perf/util/cloexec.c | 2 +- > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 - > tools/perf/util/cs-etm.c | 2 +- > tools/perf/util/data.c | 3 +- > tools/perf/util/debug.c | 1 - > tools/perf/util/debug.h | 2 +- > tools/perf/util/demangle-java.c | 1 - > tools/perf/util/demangle-rust.c | 1 - > tools/perf/util/dwarf-regs.c | 1 - > tools/perf/util/env.h | 3 +- > tools/perf/util/event.c | 1109 +----------- > tools/perf/util/event.h | 77 +- > tools/perf/util/evlist.c | 2 +- > tools/perf/util/evsel.c | 280 +-- > tools/perf/util/evsel.h | 5 - > tools/perf/util/evsel_fprintf.c | 1 + > tools/perf/util/header.c | 395 +--- > tools/perf/util/header.h | 60 +- > tools/perf/util/hist.h | 1 + > tools/perf/util/intel-bts.c | 2 +- > tools/perf/util/intel-pt.c | 1 + > tools/perf/util/jitdump.c | 2 - > tools/perf/util/kvm-stat.h | 4 + > tools/perf/util/libunwind/arm64.c | 1 - > tools/perf/util/libunwind/x86_32.c | 1 - > tools/perf/util/llvm-utils.c | 1 + > tools/perf/util/lzma.c | 2 +- > tools/perf/util/machine.c | 15 - > tools/perf/util/machine.h | 15 - > tools/perf/util/memswap.h | 7 + > tools/perf/util/namespaces.c | 18 + > tools/perf/util/namespaces.h | 2 + > tools/perf/util/parse-events.c | 1 - > tools/perf/util/perf-hooks.c | 1 - > tools/perf/util/pmu.c | 1 - > tools/perf/util/probe-file.c | 1 + > tools/perf/util/python.c | 4 +- > tools/perf/util/record.c | 2 - > tools/perf/util/rwsem.c | 1 + > tools/perf/util/s390-cpumsf.c | 1 - > tools/perf/util/s390-sample-raw.c | 1 - > .../util/scripting-engines/trace-event-python.c | 2 - > tools/perf/util/session.c | 72 +- > tools/perf/util/session.h | 5 - > tools/perf/util/srccode.c | 2 +- > tools/perf/util/stat.c | 60 +- > tools/perf/util/stat.h | 9 +- > tools/perf/util/svghelper.c | 2 +- > tools/perf/util/symbol-elf.c | 3 + > tools/perf/util/symbol-minimal.c | 3 +- > tools/perf/util/symbol.c | 2 +- > tools/perf/util/synthetic-events.c | 1884 ++++++++++++++++++++ > tools/perf/util/synthetic-events.h | 103 ++ > tools/perf/util/target.c | 2 - > tools/perf/util/top.c | 1 - > tools/perf/util/trace-event-info.c | 2 +- > tools/perf/util/trace-event-read.c | 1 - > tools/perf/util/trace-event.c | 1 - > tools/perf/util/tsc.h | 14 +- > tools/perf/util/unwind-libdw.c | 1 - > tools/perf/util/unwind-libunwind-local.c | 1 - > tools/perf/util/usage.c | 1 - > tools/perf/util/vdso.c | 2 +- > tools/perf/util/zlib.c | 4 +- > 180 files changed, 2763 insertions(+), 2256 deletions(-) > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json > create mode 100644 tools/perf/util/synthetic-events.c > create mode 100644 tools/perf/util/synthetic-events.h Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-09-01 12:22 Arnaldo Carvalho de Melo 2019-09-02 7:14 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-09-01 12:22 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Jin Yao, Joe Mario, Josh Poimboeuf, Kyle Meyer, Patrick McLean, Steven Rostedt, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 39c2ca43465e0f52ebba3ee96fd03436367c1880: Merge tag 'perf-core-for-mingo-5.4-20190829' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-08-29 20:56:32 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190901 for you to fetch changes up to ae31a514a134d9e4ca1d7b0f0a19b5934747d79f: objtool: Ignore intentional differences for the x86 insn decoder (2019-08-31 22:27:52 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: objtool: Josh Poimboeuf: - Move x86 insn decoder to a common location. Arnaldo Carvalho de Melo: - Ignore intentional differences for the x86 insn decoder. build: Arnaldo Carvalho de Melo: - Ignore intentional differences for the x86 insn decoder. Intel PT: Josh Poimboeuf: - Use shared x86 insn decoder. metric groups: Jin Yao: - Scale the metric result. - Support multiple events. perf c2c: Jiri Olsa: - Display proper cpu count in nodes column. Miscellaneous: Kyle Meyer: - Replace MAX_NR_CPUS with perf_env::nr_cpus_online, i.e. with the number of online CPUs as detected at tool start and/or recorded in the perf.data file. libtraceevent: Tzvetomir Stoyanov: - Simplify the tep_print_event_* APIs. - Remove tep_register_trace_clock(). - Change users plugin directory. Cleanups: Arnaldo Carvalho de Melo: - Continue taming the includes hell: remove needless include directives, fix the fallout, rinse, repeat. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Arnaldo Carvalho de Melo (29): perf tools: Remove needless libtraceevent include directives perf header: Move CPUINFO_PROC to the only file where it is used perf tools: Move everything related to sys_perf_event_open() to perf-sys.h perf time-utils: Adopt rdclock() from perf.h perf tools: Remove needless perf.h include directive from headers perf tools: Remove perf.h from source files not needing it perf tools: Remove debug.h from header files not needing it perf debug: Remove needless include directives from debug.h perf env: Remove env.h from other headers where just a fwd decl is needed perf event: Remove needless include directives from event.h perf dso: Adopt DSO related macros from symbol.h perf symbol: Move C++ demangle defines to the only file using it perf symbols: Add missing linux/refcount.h to symbol.h perf symbols: Move symsrc prototypes to a separate header perf dsos: Move the dsos struct and its methods to separate source files perf hist: Remove needless ui/progress.h from hist.h perf tools: Move 'struct events_stats' and prototypes to separate header perf tools: Remove needless sort.h include directives perf probe: No need for symbol.h, symbol_conf is enough perf tools: Remove needless map.h include directives perf tools: Remove needless thread.h include directives perf tools: Remove needless thread_map.h include directives perf tools: Remove needless evlist.h include directives perf tools: Remove needless evlist.h include directives perf auxtrace: Uninline functions that touch perf_session perf symbols: Move mem_info and branch_info out of symbol.h perf build: Ignore intentional differences for the x86 insn decoder objtool: Update sync-check.sh from perf's check-headers.sh objtool: Ignore intentional differences for the x86 insn decoder Jin Yao (3): perf pmu: Change convert_scale from static to global perf metricgroup: Scale the metric result perf metricgroup: Support multiple events for metricgroup Jiri Olsa (1): perf c2c: Display proper cpu count in nodes column Josh Poimboeuf (4): objtool: Move x86 insn decoder to a common location perf: Update .gitignore file perf intel-pt: Remove inat.c from build dependency list perf intel-pt: Use shared x86 insn decoder Kyle Meyer (7): perf timechart: Refactor svg_build_topology_map() perf svghelper: Replace MAX_NR_CPUS with perf_env::nr_cpus_online perf stat: Replace MAX_NR_CPUS with cpu__max_cpu() perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online perf header: Replace MAX_NR_CPUS with cpu__max_cpu() libperf: Warn when exceeding MAX_NR_CPUS in cpumap Tzvetomir Stoyanov (3): libtraceevent, perf tools: Changes in tep_print_event_* APIs libtraceevent: Remove tep_register_trace_clock() libtraceevent: Change users plugin directory .../x86/include/asm}/inat.h | 0 .../arch/x86/include/asm/inat_types.h | 0 .../x86/include/asm}/insn.h | 0 .../{objtool => }/arch/x86/include/asm/orc_types.h | 0 tools/{objtool => }/arch/x86/lib/inat.c | 2 +- tools/{objtool => }/arch/x86/lib/insn.c | 4 +- .../{objtool => }/arch/x86/lib/x86-opcode-map.txt | 0 .../arch/x86/tools/gen-insn-attr-x86.awk | 0 tools/lib/traceevent/Makefile | 6 +- tools/lib/traceevent/event-parse-api.c | 40 - tools/lib/traceevent/event-parse-local.h | 6 - tools/lib/traceevent/event-parse.c | 333 +++--- tools/lib/traceevent/event-parse.h | 30 +- tools/lib/traceevent/event-plugin.c | 2 +- tools/objtool/Makefile | 4 +- tools/objtool/arch/x86/Build | 4 +- tools/objtool/arch/x86/decode.c | 4 +- tools/objtool/arch/x86/include/asm/inat.h | 230 ----- tools/objtool/arch/x86/include/asm/insn.h | 216 ---- tools/objtool/sync-check.sh | 44 +- tools/perf/.gitignore | 3 + tools/perf/arch/arm/annotate/instructions.c | 1 + tools/perf/arch/arm/util/auxtrace.c | 1 + tools/perf/arch/arm/util/cs-etm.c | 4 +- tools/perf/arch/arm64/annotate/instructions.c | 1 + tools/perf/arch/arm64/util/sym-handling.c | 8 +- tools/perf/arch/common.c | 3 + tools/perf/arch/common.h | 4 +- tools/perf/arch/powerpc/util/mem-events.c | 1 + tools/perf/arch/powerpc/util/perf_regs.c | 1 - tools/perf/arch/powerpc/util/sym-handling.c | 1 + tools/perf/arch/powerpc/util/unwind-libdw.c | 1 + tools/perf/arch/x86/tests/bp-modify.c | 1 + tools/perf/arch/x86/tests/insn-x86.c | 3 +- tools/perf/arch/x86/tests/intel-cqm.c | 1 - tools/perf/arch/x86/tests/perf-time-to-tsc.c | 2 + tools/perf/arch/x86/tests/rdpmc.c | 4 +- tools/perf/arch/x86/util/archinsn.c | 3 +- tools/perf/arch/x86/util/perf_regs.c | 4 +- tools/perf/arch/x86/util/tsc.c | 2 +- tools/perf/bench/epoll-ctl.c | 1 + tools/perf/bench/epoll-wait.c | 1 + tools/perf/bench/mem-functions.c | 3 +- tools/perf/bench/numa.c | 1 - tools/perf/bench/sched-messaging.c | 1 - tools/perf/bench/sched-pipe.c | 1 - tools/perf/builtin-annotate.c | 4 +- tools/perf/builtin-bench.c | 1 - tools/perf/builtin-buildid-cache.c | 5 +- tools/perf/builtin-buildid-list.c | 4 +- tools/perf/builtin-c2c.c | 7 +- tools/perf/builtin-config.c | 3 +- tools/perf/builtin-data.c | 2 + tools/perf/builtin-diff.c | 2 + tools/perf/builtin-ftrace.c | 5 +- tools/perf/builtin-help.c | 5 +- tools/perf/builtin-inject.c | 2 +- tools/perf/builtin-kallsyms.c | 1 + tools/perf/builtin-kmem.c | 5 +- tools/perf/builtin-kvm.c | 5 +- tools/perf/builtin-list.c | 5 +- tools/perf/builtin-lock.c | 4 +- tools/perf/builtin-mem.c | 2 + tools/perf/builtin-probe.c | 5 +- tools/perf/builtin-record.c | 2 + tools/perf/builtin-report.c | 7 + tools/perf/builtin-sched.c | 3 +- tools/perf/builtin-script.c | 4 +- tools/perf/builtin-stat.c | 3 +- tools/perf/builtin-timechart.c | 10 +- tools/perf/builtin-top.c | 5 +- tools/perf/builtin-trace.c | 4 + tools/perf/builtin-version.c | 2 +- tools/perf/check-headers.sh | 11 +- tools/perf/lib/cpumap.c | 6 + tools/perf/perf-sys.h | 51 +- tools/perf/perf.c | 7 +- tools/perf/perf.h | 21 - tools/perf/scripts/perl/Perf-Trace-Util/Context.c | 1 - .../perf/scripts/python/Perf-Trace-Util/Context.c | 1 - tools/perf/tests/attr.c | 3 +- tools/perf/tests/backward-ring-buffer.c | 2 + tools/perf/tests/bp_account.c | 3 +- tools/perf/tests/bp_signal.c | 3 +- tools/perf/tests/bp_signal_overflow.c | 3 +- tools/perf/tests/bpf.c | 2 + tools/perf/tests/builtin-test.c | 1 + tools/perf/tests/code-reading.c | 8 + tools/perf/tests/dso-data.c | 1 + tools/perf/tests/dwarf-unwind.c | 1 + tools/perf/tests/event-times.c | 2 + tools/perf/tests/event_update.c | 3 + tools/perf/tests/expr.c | 1 + tools/perf/tests/hists_common.c | 3 +- tools/perf/tests/hists_cumulate.c | 2 +- tools/perf/tests/hists_filter.c | 2 - tools/perf/tests/hists_link.c | 2 - tools/perf/tests/hists_output.c | 2 +- tools/perf/tests/keep-tracking.c | 2 + tools/perf/tests/kmod-path.c | 2 + tools/perf/tests/llvm.c | 2 +- tools/perf/tests/mem.c | 1 + tools/perf/tests/mem2node.c | 2 + tools/perf/tests/mmap-basic.c | 3 + tools/perf/tests/openat-syscall-all-cpus.c | 1 + tools/perf/tests/openat-syscall-tp-fields.c | 1 + tools/perf/tests/openat-syscall.c | 1 + tools/perf/tests/parse-events.c | 1 + tools/perf/tests/perf-record.c | 1 + tools/perf/tests/sample-parsing.c | 2 + tools/perf/tests/sdt.c | 3 +- tools/perf/tests/sw-clock.c | 2 + tools/perf/tests/switch-tracking.c | 2 + tools/perf/tests/task-exit.c | 2 + tools/perf/tests/thread-map.c | 7 + tools/perf/tests/thread-mg-share.c | 1 - tools/perf/tests/unit_number__scnprintf.c | 1 + tools/perf/tests/vmlinux-kallsyms.c | 1 + tools/perf/tests/wp.c | 5 + tools/perf/ui/browser.c | 1 - tools/perf/ui/browsers/annotate.c | 2 + tools/perf/ui/browsers/header.c | 1 - tools/perf/ui/browsers/hists.c | 6 + tools/perf/ui/browsers/map.c | 1 + tools/perf/ui/browsers/res_sample.c | 3 + tools/perf/ui/browsers/scripts.c | 4 +- tools/perf/ui/gtk/annotate.c | 1 + tools/perf/ui/gtk/browser.c | 2 - tools/perf/ui/gtk/helpline.c | 1 + tools/perf/ui/gtk/hists.c | 1 - tools/perf/ui/gtk/setup.c | 1 - tools/perf/ui/gtk/util.c | 1 + tools/perf/ui/helpline.h | 2 - tools/perf/ui/hist.c | 4 + tools/perf/ui/progress.c | 1 - tools/perf/ui/setup.c | 3 +- tools/perf/ui/stdio/hist.c | 1 + tools/perf/ui/tui/helpline.c | 2 + tools/perf/ui/tui/progress.c | 1 - tools/perf/ui/tui/setup.c | 3 +- tools/perf/ui/tui/util.c | 1 - tools/perf/ui/util.c | 2 +- tools/perf/util/Build | 1 + tools/perf/util/annotate.c | 5 +- tools/perf/util/arm-spe.c | 4 +- tools/perf/util/auxtrace.c | 33 + tools/perf/util/auxtrace.h | 52 +- tools/perf/util/bpf-event.c | 1 + tools/perf/util/bpf-event.h | 1 + tools/perf/util/bpf-loader.c | 2 +- tools/perf/util/bpf-prologue.c | 2 +- tools/perf/util/branch.c | 3 +- tools/perf/util/branch.h | 8 + tools/perf/util/build-id.c | 1 + tools/perf/util/cacheline.c | 1 - tools/perf/util/callchain.c | 3 + tools/perf/util/callchain.h | 1 + tools/perf/util/cgroup.c | 3 +- tools/perf/util/cloexec.c | 4 +- tools/perf/util/color.c | 3 +- tools/perf/util/color_config.c | 3 +- tools/perf/util/config.c | 4 + tools/perf/util/cpumap.c | 1 - tools/perf/util/cputopo.h | 1 - tools/perf/util/cs-etm.c | 6 +- tools/perf/util/cs-etm.h | 3 +- tools/perf/util/data.c | 1 + tools/perf/util/db-export.c | 1 + tools/perf/util/debug.c | 6 +- tools/perf/util/debug.h | 6 +- tools/perf/util/dso.c | 237 +---- tools/perf/util/dso.h | 28 +- tools/perf/util/dsos.c | 232 +++++ tools/perf/util/dsos.h | 44 + tools/perf/util/dwarf-aux.c | 1 + tools/perf/util/dwarf-aux.h | 2 + tools/perf/util/env.c | 1 + tools/perf/util/event.c | 5 +- tools/perf/util/event.h | 61 +- tools/perf/util/events_stats.h | 51 + tools/perf/util/evlist.c | 3 + tools/perf/util/evlist.h | 3 +- tools/perf/util/evsel.c | 2 + tools/perf/util/evsel.h | 1 + tools/perf/util/expr.y | 2 + tools/perf/util/genelf.c | 3 +- tools/perf/util/genelf_debug.c | 1 - tools/perf/util/header.c | 27 +- tools/perf/util/hist.c | 7 + tools/perf/util/hist.h | 6 +- tools/perf/util/intel-bts.c | 2 +- tools/perf/util/intel-pt-decoder/Build | 22 +- .../util/intel-pt-decoder/gen-insn-attr-x86.awk | 392 ------- tools/perf/util/intel-pt-decoder/inat.c | 82 -- tools/perf/util/intel-pt-decoder/inat_types.h | 15 - tools/perf/util/intel-pt-decoder/insn.c | 593 ----------- .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 2 +- .../util/intel-pt-decoder/intel-pt-insn-decoder.c | 10 +- .../perf/util/intel-pt-decoder/x86-opcode-map.txt | 1072 -------------------- tools/perf/util/intel-pt.c | 2 +- tools/perf/util/jitdump.c | 1 + tools/perf/util/llvm-utils.c | 1 + tools/perf/util/llvm-utils.h | 2 +- tools/perf/util/lzma.c | 1 + tools/perf/util/machine.c | 18 +- tools/perf/util/machine.h | 3 +- tools/perf/util/map.c | 3 + tools/perf/util/mem-events.c | 2 +- tools/perf/util/mem-events.h | 9 + tools/perf/util/mem2node.c | 2 + tools/perf/util/mem2node.h | 3 +- tools/perf/util/metricgroup.c | 89 +- tools/perf/util/metricgroup.h | 1 + tools/perf/util/mmap.c | 4 + tools/perf/util/mmap.h | 1 + tools/perf/util/ordered-events.c | 1 + tools/perf/util/parse-branch-options.c | 3 +- tools/perf/util/parse-events.c | 4 +- tools/perf/util/path.c | 3 +- tools/perf/util/path.h | 3 + tools/perf/util/perf-hooks.c | 1 + tools/perf/util/pmu.c | 9 +- tools/perf/util/pmu.h | 2 + tools/perf/util/probe-event.c | 6 +- tools/perf/util/probe-file.c | 4 +- tools/perf/util/probe-finder.c | 1 + tools/perf/util/pstack.c | 1 + tools/perf/util/python.c | 4 + tools/perf/util/record.c | 4 + tools/perf/util/s390-cpumsf.c | 2 +- tools/perf/util/s390-sample-raw.c | 2 - .../perf/util/scripting-engines/trace-event-perl.c | 2 +- .../util/scripting-engines/trace-event-python.c | 3 +- tools/perf/util/session.c | 10 +- tools/perf/util/sort.c | 9 +- tools/perf/util/sort.h | 1 - tools/perf/util/stat-display.c | 1 + tools/perf/util/stat-shadow.c | 65 +- tools/perf/util/stat.c | 8 +- tools/perf/util/strbuf.c | 5 + tools/perf/util/svghelper.c | 54 +- tools/perf/util/svghelper.h | 4 +- tools/perf/util/symbol-elf.c | 7 + tools/perf/util/symbol-minimal.c | 2 + tools/perf/util/symbol.c | 5 + tools/perf/util/symbol.h | 63 +- tools/perf/util/symbol_fprintf.c | 1 + tools/perf/util/symsrc.h | 46 + tools/perf/util/target.c | 3 + tools/perf/util/thread-stack.c | 1 + tools/perf/util/thread.c | 2 +- tools/perf/util/time-utils.c | 1 - tools/perf/util/time-utils.h | 9 + tools/perf/util/top.c | 1 + tools/perf/util/top.h | 1 + tools/perf/util/trace-event-info.c | 1 - tools/perf/util/trace-event-parse.c | 3 +- tools/perf/util/trace-event-read.c | 1 - tools/perf/util/trace-event-scripting.c | 1 - tools/perf/util/trace-event.h | 1 - tools/perf/util/trigger.h | 1 - tools/perf/util/unwind-libdw.c | 1 + tools/perf/util/unwind-libunwind.c | 1 + tools/perf/util/util.c | 2 +- tools/perf/util/values.c | 1 + tools/perf/util/vdso.c | 1 + tools/perf/util/zlib.c | 1 + 267 files changed, 1319 insertions(+), 3578 deletions(-) rename tools/{perf/util/intel-pt-decoder => arch/x86/include/asm}/inat.h (100%) rename tools/{objtool => }/arch/x86/include/asm/inat_types.h (100%) rename tools/{perf/util/intel-pt-decoder => arch/x86/include/asm}/insn.h (100%) rename tools/{objtool => }/arch/x86/include/asm/orc_types.h (100%) rename tools/{objtool => }/arch/x86/lib/inat.c (98%) rename tools/{objtool => }/arch/x86/lib/insn.c (99%) rename tools/{objtool => }/arch/x86/lib/x86-opcode-map.txt (100%) rename tools/{objtool => }/arch/x86/tools/gen-insn-attr-x86.awk (100%) delete mode 100644 tools/objtool/arch/x86/include/asm/inat.h delete mode 100644 tools/objtool/arch/x86/include/asm/insn.h create mode 100644 tools/perf/util/dsos.c create mode 100644 tools/perf/util/dsos.h create mode 100644 tools/perf/util/events_stats.h delete mode 100644 tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk delete mode 100644 tools/perf/util/intel-pt-decoder/inat.c delete mode 100644 tools/perf/util/intel-pt-decoder/inat_types.h delete mode 100644 tools/perf/util/intel-pt-decoder/insn.c delete mode 100644 tools/perf/util/intel-pt-decoder/x86-opcode-map.txt create mode 100644 tools/perf/util/symsrc.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0-rc6.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20190816 gcc-9-branch@274554, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 9.2.1-4) 9.2.1 20190821, clang version 7.0.1-9+b1 (tags/RELEASE_701/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.2.1 20190820 [gcc-9-branch revision 274748], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-4ubuntu1) 9.2.1 20190821, clang version 9.0.0-+rc2-1~exp1 (tags/RELEASE_900/rc2) # # uname -a Linux quaco 5.2.6-200.fc30.x86_64 #1 SMP Mon Aug 5 13:20:47 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 ae31a514a134 objtool: Ignore intentional differences for the x86 insn decoder # perf version --build-options perf version 5.3.rc6.gae31a514a134 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . - /home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP: make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_util_map_o_O: make util/map.o make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_slang_O: make NO_SLANG=1 make_install_O: make install make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_clean_all_O: make clean all make_no_gtk2_O: make NO_GTK2=1 make_pure_O: make make_no_libelf_O: make NO_LIBELF=1 make_debug_O: make DEBUG=1 make_no_newt_O: make NO_NEWT=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_libperl_O: make NO_LIBPERL=1 make_help_O: make help make_no_libnuma_O: make NO_LIBNUMA=1 make_no_demangle_O: make NO_DEMANGLE=1 make_tags_O: make tags make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libpython_O: make NO_LIBPYTHON=1 - /home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC: make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump make_static_O: make LDFLAGS=-static make_with_clangllvm_O: make LIBCLANGLLVM=1 make_install_bin_O: make install-bin make_cscope_O: make cscope make_perf_o_O: make perf.o make_doc_O: make doc make_no_libbpf_O: make NO_LIBBPF=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_install_prefix_O: make install prefix=/tmp/krava make_with_babeltrace_O: make LIBBABELTRACE=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-09-01 12:22 Arnaldo Carvalho de Melo @ 2019-09-02 7:14 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-09-02 7:14 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Jin Yao, Joe Mario, Josh Poimboeuf, Kyle Meyer, Patrick McLean, Steven Rostedt, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 39c2ca43465e0f52ebba3ee96fd03436367c1880: > > Merge tag 'perf-core-for-mingo-5.4-20190829' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-08-29 20:56:32 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190901 > > for you to fetch changes up to ae31a514a134d9e4ca1d7b0f0a19b5934747d79f: > > objtool: Ignore intentional differences for the x86 insn decoder (2019-08-31 22:27:52 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > objtool: > > Josh Poimboeuf: > > - Move x86 insn decoder to a common location. > > Arnaldo Carvalho de Melo: > > - Ignore intentional differences for the x86 insn decoder. > > build: > > Arnaldo Carvalho de Melo: > > - Ignore intentional differences for the x86 insn decoder. > > Intel PT: > > Josh Poimboeuf: > > - Use shared x86 insn decoder. > > metric groups: > > Jin Yao: > > - Scale the metric result. > > - Support multiple events. > > perf c2c: > > Jiri Olsa: > > - Display proper cpu count in nodes column. > > Miscellaneous: > > Kyle Meyer: > > - Replace MAX_NR_CPUS with perf_env::nr_cpus_online, i.e. with > the number of online CPUs as detected at tool start and/or > recorded in the perf.data file. > > libtraceevent: > > Tzvetomir Stoyanov: > > - Simplify the tep_print_event_* APIs. > > - Remove tep_register_trace_clock(). > > - Change users plugin directory. > > Cleanups: > > Arnaldo Carvalho de Melo: > > - Continue taming the includes hell: remove needless include directives, fix > the fallout, rinse, repeat. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Arnaldo Carvalho de Melo (29): > perf tools: Remove needless libtraceevent include directives > perf header: Move CPUINFO_PROC to the only file where it is used > perf tools: Move everything related to sys_perf_event_open() to perf-sys.h > perf time-utils: Adopt rdclock() from perf.h > perf tools: Remove needless perf.h include directive from headers > perf tools: Remove perf.h from source files not needing it > perf tools: Remove debug.h from header files not needing it > perf debug: Remove needless include directives from debug.h > perf env: Remove env.h from other headers where just a fwd decl is needed > perf event: Remove needless include directives from event.h > perf dso: Adopt DSO related macros from symbol.h > perf symbol: Move C++ demangle defines to the only file using it > perf symbols: Add missing linux/refcount.h to symbol.h > perf symbols: Move symsrc prototypes to a separate header > perf dsos: Move the dsos struct and its methods to separate source files > perf hist: Remove needless ui/progress.h from hist.h > perf tools: Move 'struct events_stats' and prototypes to separate header > perf tools: Remove needless sort.h include directives > perf probe: No need for symbol.h, symbol_conf is enough > perf tools: Remove needless map.h include directives > perf tools: Remove needless thread.h include directives > perf tools: Remove needless thread_map.h include directives > perf tools: Remove needless evlist.h include directives > perf tools: Remove needless evlist.h include directives > perf auxtrace: Uninline functions that touch perf_session > perf symbols: Move mem_info and branch_info out of symbol.h > perf build: Ignore intentional differences for the x86 insn decoder > objtool: Update sync-check.sh from perf's check-headers.sh > objtool: Ignore intentional differences for the x86 insn decoder > > Jin Yao (3): > perf pmu: Change convert_scale from static to global > perf metricgroup: Scale the metric result > perf metricgroup: Support multiple events for metricgroup > > Jiri Olsa (1): > perf c2c: Display proper cpu count in nodes column > > Josh Poimboeuf (4): > objtool: Move x86 insn decoder to a common location > perf: Update .gitignore file > perf intel-pt: Remove inat.c from build dependency list > perf intel-pt: Use shared x86 insn decoder > > Kyle Meyer (7): > perf timechart: Refactor svg_build_topology_map() > perf svghelper: Replace MAX_NR_CPUS with perf_env::nr_cpus_online > perf stat: Replace MAX_NR_CPUS with cpu__max_cpu() > perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online > perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online > perf header: Replace MAX_NR_CPUS with cpu__max_cpu() > libperf: Warn when exceeding MAX_NR_CPUS in cpumap > > Tzvetomir Stoyanov (3): > libtraceevent, perf tools: Changes in tep_print_event_* APIs > libtraceevent: Remove tep_register_trace_clock() > libtraceevent: Change users plugin directory > > 267 files changed, 1319 insertions(+), 3578 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-08-29 14:38 Arnaldo Carvalho de Melo 2019-08-29 18:58 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-08-29 14:38 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Igor Lubashev, Karl Rister, Mathieu Poirier, Naveen N . Rao, Nicholas Piggin, Steven Rostedt, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 42880f726c66f13ae1d9ac9ce4c43abe64ecac84: perf/x86/intel: Support PEBS output to PT (2019-08-28 11:29:39 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190829 for you to fetch changes up to 301011ba622513cb41ced59973972204e0da2f71: tools lib traceevent: Remove unneeded qsort and uses memmove instead (2019-08-29 08:36:12 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf top: Namhyung Kim: - Decay all events in the evlist, we were decaying just the first event in a group. - Fix linking of histograms in different evsels in a event group with more than two events. With the two fixes above a command line such as: # perf top -e '{cycles,instructions,cache-misses,cache-references} Should work as expected, with four columns and with all of them being decayed over time, i.e. less weight is given for older samples. perf record: Arnaldo Carvalho de Melo: - Fix collection of build-ids when using setns() to get into namespaces, which had been broken with the introduction of the extra thread to react to PERF_RECORD_BPF_EVENT, i.e. to collect extra info for BPF programs. We need to unshare(CLONE_FS) in that thread so that the main one can do the setns(CLONE_NEWNS) when collectingthe build-ids. Without that symbol resolution gets more difficult and potentially misresolves symbols. core: Igor Lubashev: - Further alignment in permission checking via capabilities to how the kernel checks what tooling tries to do. PowerPC: Naveen N. Rao: - Sync powerpc syscall.tbl, so that 'perf trace' gets the definitions for recent syscalls. libperf: Jiri Olsa: - Move the rest of the PERF_RECORD_ metadata struct definitions so that we can use 'union perf_event'. libtraceevent: Steven Rostedt (VMware): - Do not free tep->cmdlines in add_new_comm() on failure. - Remove unneeded qsort and uses memmove instead Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Arnaldo Carvalho de Melo (4): perf tools: Remove needless util.h include from builtin.h perf evlist: Remove needless util.h from evlist.h perf clang: Delete needless util-cxx.h header perf evlist: Use unshare(CLONE_FS) in sb threads to let setns(CLONE_NEWNS) work Igor Lubashev (5): perf event: Check ref_reloc_sym before using it perf tools: Use CAP_SYS_ADMIN with perf_event_paranoid checks perf evsel: Kernel profiling is disallowed only when perf_event_paranoid > 1 perf symbols: Use CAP_SYSLOG with kptr_restrict checks perf tools: Warn that perf_event_paranoid can restrict kernel symbols Jiri Olsa (23): libperf: Add PERF_RECORD_HEADER_ATTR 'struct attr_event' to perf/event.h libperf: Add PERF_RECORD_CPU_MAP 'struct cpu_map_event' to perf/event.h libperf: Add PERF_RECORD_EVENT_UPDATE 'struct event_update_event' to perf/event.h libperf: Add PERF_RECORD_HEADER_EVENT_TYPE 'struct event_type_event' to perf/event.h libperf: Add PERF_RECORD_HEADER_TRACING_DATA 'struct tracing_data_event' to perf/event.h libperf: Add PERF_RECORD_HEADER_BUILD_ID 'struct build_id_event' to perf/event.h libperf: Add PERF_RECORD_ID_INDEX 'struct id_index_event' to perf/event.h libperf: Add PERF_RECORD_AUXTRACE_INFO 'struct auxtrace_info_event' to perf/event.h libperf: Add PERF_RECORD_AUXTRACE 'struct auxtrace_event' to perf/event.h libperf: Add PERF_RECORD_AUXTRACE_ERROR 'struct auxtrace_error_event' to perf/event.h libperf: Add PERF_RECORD_AUX 'struct aux_event' to perf/event.h libperf: Add PERF_RECORD_ITRACE_START 'struct itrace_start_event' to perf/event.h libperf: Add PERF_RECORD_SWITCH 'struct context_switch_event' to perf/event.h libperf: Add PERF_RECORD_THREAD_MAP 'struct thread_map_event' to perf/event.h libperf: Add PERF_RECORD_STAT_CONFIG 'struct stat_config_event' to perf/event.h libperf: Add PERF_RECORD_STAT 'struct stat_event' to perf/event.h libperf: Add PERF_RECORD_STAT_ROUND 'struct stat_round_event' to perf/event.h libperf: Add PERF_RECORD_TIME_CONV 'struct time_conv_event' to perf/event.h libperf: Add PERF_RECORD_HEADER_FEATURE 'struct feature_event' to perf/event.h libperf: Add PERF_RECORD_COMPRESSED 'struct compressed_event' to perf/event.h libperf: Add 'union perf_event' to perf/event.h libperf: Rename the PERF_RECORD_ structs to have a "perf" prefix libperf: Move 'enum perf_user_event_type' to perf/event.h Namhyung Kim (2): perf top: Decay all events in the evlist perf top: Fix event group with more than two events Naveen N. Rao (1): perf arch powerpc: Sync powerpc syscall.tbl Steven Rostedt (VMware) (2): tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure tools lib traceevent: Remove unneeded qsort and uses memmove instead tools/lib/traceevent/event-parse.c | 58 ++++- tools/perf/arch/arm/util/cs-etm.c | 7 +- tools/perf/arch/arm64/util/arm-spe.c | 5 +- tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 146 +++++++++-- tools/perf/arch/s390/util/auxtrace.c | 2 +- tools/perf/arch/x86/util/intel-bts.c | 6 +- tools/perf/arch/x86/util/intel-pt.c | 7 +- tools/perf/arch/x86/util/tsc.c | 2 +- tools/perf/builtin-buildid-cache.c | 1 + tools/perf/builtin-record.c | 6 +- tools/perf/builtin-report.c | 3 +- tools/perf/builtin-script.c | 3 +- tools/perf/builtin-stat.c | 2 +- tools/perf/builtin-top.c | 47 ++-- tools/perf/builtin-trace.c | 3 +- tools/perf/builtin.h | 2 - tools/perf/lib/include/perf/event.h | 273 ++++++++++++++++++++ tools/perf/perf.c | 1 + tools/perf/tests/cpumap.c | 12 +- tools/perf/tests/event_update.c | 16 +- tools/perf/tests/sdt.c | 1 + tools/perf/tests/stat.c | 8 +- tools/perf/tests/thread-map.c | 2 +- tools/perf/util/arm-spe.c | 6 +- tools/perf/util/auxtrace.c | 21 +- tools/perf/util/auxtrace.h | 8 +- tools/perf/util/bpf-loader.c | 1 + tools/perf/util/build-id.c | 2 +- tools/perf/util/c++/clang-c.h | 2 +- tools/perf/util/c++/clang-test.cpp | 4 +- tools/perf/util/cpumap.c | 6 +- tools/perf/util/cpumap.h | 4 +- tools/perf/util/cs-etm.c | 4 +- tools/perf/util/event.c | 45 ++-- tools/perf/util/event.h | 278 +-------------------- tools/perf/util/evlist.c | 10 + tools/perf/util/evlist.h | 1 - tools/perf/util/evsel.c | 3 +- tools/perf/util/header.c | 57 ++--- tools/perf/util/hist.c | 39 +-- tools/perf/util/hist.h | 1 + tools/perf/util/intel-bts.c | 6 +- tools/perf/util/intel-pt.c | 12 +- tools/perf/util/python.c | 4 +- tools/perf/util/s390-cpumsf.c | 4 +- tools/perf/util/session.c | 29 +-- tools/perf/util/session.h | 2 +- tools/perf/util/stat.c | 12 +- tools/perf/util/symbol.c | 15 +- tools/perf/util/thread_map.c | 4 +- tools/perf/util/thread_map.h | 4 +- tools/perf/util/util-cxx.h | 27 -- 52 files changed, 684 insertions(+), 540 deletions(-) delete mode 100644 tools/perf/util/util-cxx.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0-rc6.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20190816 gcc-9-branch@274554, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 9.2.1-4) 9.2.1 20190821, clang version 7.0.1-9+b1 (tags/RELEASE_701/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190805 [gcc-9-branch revision 274114], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 # # uname -a Linux quaco 5.2.6-200.fc30.x86_64 #1 SMP Mon Aug 5 13:20:47 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 301011ba6225 tools lib traceevent: Remove unneeded qsort and uses memmove instead # perf version --build-options perf version 5.3.rc6.g301011ba6225 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_help_O: make help make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_debug_O: make DEBUG=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_gtk2_O: make NO_GTK2=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_libperl_O: make NO_LIBPERL=1 make_clean_all_O: make clean all make_no_libbpf_O: make NO_LIBBPF=1 make_cscope_O: make cscope make_tags_O: make tags make_no_libbionic_O: make NO_LIBBIONIC=1 make_static_O: make LDFLAGS=-static make_no_libelf_O: make NO_LIBELF=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_doc_O: make doc make_no_slang_O: make NO_SLANG=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_perf_o_O: make perf.o make_install_bin_O: make install-bin make_pure_O: make make_no_newt_O: make NO_NEWT=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_install_O: make install make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_util_map_o_O: make util/map.o make_no_libnuma_O: make NO_LIBNUMA=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-08-29 14:38 Arnaldo Carvalho de Melo @ 2019-08-29 18:58 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-08-29 18:58 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Igor Lubashev, Karl Rister, Mathieu Poirier, Naveen N . Rao, Nicholas Piggin, Steven Rostedt, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 42880f726c66f13ae1d9ac9ce4c43abe64ecac84: > > perf/x86/intel: Support PEBS output to PT (2019-08-28 11:29:39 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190829 > > for you to fetch changes up to 301011ba622513cb41ced59973972204e0da2f71: > > tools lib traceevent: Remove unneeded qsort and uses memmove instead (2019-08-29 08:36:12 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf top: > > Namhyung Kim: > > - Decay all events in the evlist, we were decaying just the first event > in a group. > > - Fix linking of histograms in different evsels in a event group with more > than two events. > > With the two fixes above a command line such as: > > # perf top -e '{cycles,instructions,cache-misses,cache-references} > > Should work as expected, with four columns and with all of them being > decayed over time, i.e. less weight is given for older samples. > > perf record: > > Arnaldo Carvalho de Melo: > > - Fix collection of build-ids when using setns() to get into namespaces, > which had been broken with the introduction of the extra thread to > react to PERF_RECORD_BPF_EVENT, i.e. to collect extra info for BPF > programs. We need to unshare(CLONE_FS) in that thread so that the > main one can do the setns(CLONE_NEWNS) when collectingthe build-ids. > Without that symbol resolution gets more difficult and potentially > misresolves symbols. > > core: > > Igor Lubashev: > > - Further alignment in permission checking via capabilities to how the > kernel checks what tooling tries to do. > > PowerPC: > > Naveen N. Rao: > > - Sync powerpc syscall.tbl, so that 'perf trace' gets the definitions > for recent syscalls. > > libperf: > > Jiri Olsa: > > - Move the rest of the PERF_RECORD_ metadata struct definitions so that > we can use 'union perf_event'. > > libtraceevent: > > Steven Rostedt (VMware): > > - Do not free tep->cmdlines in add_new_comm() on failure. > > - Remove unneeded qsort and uses memmove instead > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Arnaldo Carvalho de Melo (4): > perf tools: Remove needless util.h include from builtin.h > perf evlist: Remove needless util.h from evlist.h > perf clang: Delete needless util-cxx.h header > perf evlist: Use unshare(CLONE_FS) in sb threads to let setns(CLONE_NEWNS) work > > Igor Lubashev (5): > perf event: Check ref_reloc_sym before using it > perf tools: Use CAP_SYS_ADMIN with perf_event_paranoid checks > perf evsel: Kernel profiling is disallowed only when perf_event_paranoid > 1 > perf symbols: Use CAP_SYSLOG with kptr_restrict checks > perf tools: Warn that perf_event_paranoid can restrict kernel symbols > > Jiri Olsa (23): > libperf: Add PERF_RECORD_HEADER_ATTR 'struct attr_event' to perf/event.h > libperf: Add PERF_RECORD_CPU_MAP 'struct cpu_map_event' to perf/event.h > libperf: Add PERF_RECORD_EVENT_UPDATE 'struct event_update_event' to perf/event.h > libperf: Add PERF_RECORD_HEADER_EVENT_TYPE 'struct event_type_event' to perf/event.h > libperf: Add PERF_RECORD_HEADER_TRACING_DATA 'struct tracing_data_event' to perf/event.h > libperf: Add PERF_RECORD_HEADER_BUILD_ID 'struct build_id_event' to perf/event.h > libperf: Add PERF_RECORD_ID_INDEX 'struct id_index_event' to perf/event.h > libperf: Add PERF_RECORD_AUXTRACE_INFO 'struct auxtrace_info_event' to perf/event.h > libperf: Add PERF_RECORD_AUXTRACE 'struct auxtrace_event' to perf/event.h > libperf: Add PERF_RECORD_AUXTRACE_ERROR 'struct auxtrace_error_event' to perf/event.h > libperf: Add PERF_RECORD_AUX 'struct aux_event' to perf/event.h > libperf: Add PERF_RECORD_ITRACE_START 'struct itrace_start_event' to perf/event.h > libperf: Add PERF_RECORD_SWITCH 'struct context_switch_event' to perf/event.h > libperf: Add PERF_RECORD_THREAD_MAP 'struct thread_map_event' to perf/event.h > libperf: Add PERF_RECORD_STAT_CONFIG 'struct stat_config_event' to perf/event.h > libperf: Add PERF_RECORD_STAT 'struct stat_event' to perf/event.h > libperf: Add PERF_RECORD_STAT_ROUND 'struct stat_round_event' to perf/event.h > libperf: Add PERF_RECORD_TIME_CONV 'struct time_conv_event' to perf/event.h > libperf: Add PERF_RECORD_HEADER_FEATURE 'struct feature_event' to perf/event.h > libperf: Add PERF_RECORD_COMPRESSED 'struct compressed_event' to perf/event.h > libperf: Add 'union perf_event' to perf/event.h > libperf: Rename the PERF_RECORD_ structs to have a "perf" prefix > libperf: Move 'enum perf_user_event_type' to perf/event.h > > Namhyung Kim (2): > perf top: Decay all events in the evlist > perf top: Fix event group with more than two events > > Naveen N. Rao (1): > perf arch powerpc: Sync powerpc syscall.tbl > > Steven Rostedt (VMware) (2): > tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure > tools lib traceevent: Remove unneeded qsort and uses memmove instead > > tools/lib/traceevent/event-parse.c | 58 ++++- > tools/perf/arch/arm/util/cs-etm.c | 7 +- > tools/perf/arch/arm64/util/arm-spe.c | 5 +- > tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 146 +++++++++-- > tools/perf/arch/s390/util/auxtrace.c | 2 +- > tools/perf/arch/x86/util/intel-bts.c | 6 +- > tools/perf/arch/x86/util/intel-pt.c | 7 +- > tools/perf/arch/x86/util/tsc.c | 2 +- > tools/perf/builtin-buildid-cache.c | 1 + > tools/perf/builtin-record.c | 6 +- > tools/perf/builtin-report.c | 3 +- > tools/perf/builtin-script.c | 3 +- > tools/perf/builtin-stat.c | 2 +- > tools/perf/builtin-top.c | 47 ++-- > tools/perf/builtin-trace.c | 3 +- > tools/perf/builtin.h | 2 - > tools/perf/lib/include/perf/event.h | 273 ++++++++++++++++++++ > tools/perf/perf.c | 1 + > tools/perf/tests/cpumap.c | 12 +- > tools/perf/tests/event_update.c | 16 +- > tools/perf/tests/sdt.c | 1 + > tools/perf/tests/stat.c | 8 +- > tools/perf/tests/thread-map.c | 2 +- > tools/perf/util/arm-spe.c | 6 +- > tools/perf/util/auxtrace.c | 21 +- > tools/perf/util/auxtrace.h | 8 +- > tools/perf/util/bpf-loader.c | 1 + > tools/perf/util/build-id.c | 2 +- > tools/perf/util/c++/clang-c.h | 2 +- > tools/perf/util/c++/clang-test.cpp | 4 +- > tools/perf/util/cpumap.c | 6 +- > tools/perf/util/cpumap.h | 4 +- > tools/perf/util/cs-etm.c | 4 +- > tools/perf/util/event.c | 45 ++-- > tools/perf/util/event.h | 278 +-------------------- > tools/perf/util/evlist.c | 10 + > tools/perf/util/evlist.h | 1 - > tools/perf/util/evsel.c | 3 +- > tools/perf/util/header.c | 57 ++--- > tools/perf/util/hist.c | 39 +-- > tools/perf/util/hist.h | 1 + > tools/perf/util/intel-bts.c | 6 +- > tools/perf/util/intel-pt.c | 12 +- > tools/perf/util/python.c | 4 +- > tools/perf/util/s390-cpumsf.c | 4 +- > tools/perf/util/session.c | 29 +-- > tools/perf/util/session.h | 2 +- > tools/perf/util/stat.c | 12 +- > tools/perf/util/symbol.c | 15 +- > tools/perf/util/thread_map.c | 4 +- > tools/perf/util/thread_map.h | 4 +- > tools/perf/util/util-cxx.h | 27 -- > 52 files changed, 684 insertions(+), 540 deletions(-) > delete mode 100644 tools/perf/util/util-cxx.h Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-08-27 1:36 Arnaldo Carvalho de Melo 2019-08-27 8:24 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-08-27 1:36 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Andi Kleen, Benjamin Peterson, Gustavo A . R . Silva, James Clark, Souptick Joarder, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 39152ee51b77851689f9b23fde6f610d13566c39: perf/x86/intel/pt: Get rid of reverse lookup table for ToPA (2019-08-26 12:00:16 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190826 for you to fetch changes up to 74a1e863eb73dcc9f069b671dfb40650f3832116: perf evsel: Rename perf_missing_features::bpf_event to ::bpf (2019-08-26 19:39:11 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf report: Andi Kleen: - Make --ns time sort key output column wide enough for nanoseconds. perf script: Gustavo A. R. Silva: - Fix memory leaks in list_scripts() perf tests: James Clark: - Fixes hang in zstd compression test by changing the source of random data. perf trace: Arnaldo Carvalho de Melo: - augmented_raw_syscalls.c BPF helper improvements. Benjamin Peterson: - Fix off-by-one error in ioctl cmd->string table. libperf: Jiri Olsa: - Move most PERF_RECORD_ structs to perf/event.h. headers: Arnaldo Carvalho de Melo: - Move cacheline related routines to separate source files. - Move record_opts and other record declarations to separate files. - Explicitly add some more needed headers here and there. Souptick Joarder: - Remove some duplicate include directives. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Andi Kleen (2): perf report: Use timestamp__scnprintf_nsec() for time sort key perf report: Fix --ns time sort key output Arnaldo Carvalho de Melo (15): perf cpumap: No need to include perf.h, ditch it perf stat: Remove needless headers from stat.h perf record: Move record_opts and other record decls out of perf.h perf cacheline: Move cacheline related routines to separate files perf srcline: Add missing srcline.h header to files needing its defs perf sort: Remove needless headers from sort.h, provide fwd struct decls perf augmented_raw_syscalls: Rename augmented_filename to augmented_arg perf augmented_raw_syscalls: Postpone tmp map lookup to after pid_filter perf augmented_raw_syscalls: Introduce helper to get the scratch space perf augmented_raw_syscalls: Reduce perf_event_output() boilerplate libperf: Rename the PERF_RECORD_ structs to have a "perf" suffix perf tools: Rename perf_event::ksymbol_event to perf_event::ksymbol perf tools: Rename perf_event::bpf_event to perf_event::bpf perf tool: Rename perf_tool::bpf_event to bpf perf evsel: Rename perf_missing_features::bpf_event to ::bpf Benjamin Peterson (1): perf trace beauty ioctl: Fix off-by-one error in cmd->string table Gustavo A. R. Silva (1): perf script: Fix memory leaks in list_scripts() James Clark (1): perf tests: Fixes hang in zstd compression test by changing the source of random data Jiri Olsa (12): libperf: Add PERF_RECORD_MMAP 'struct mmap_event' to perf/event.h libperf: Add PERF_RECORD_MMAP2 'struct mmap2_event' to perf/event.h libperf: Add PERF_RECORD_COMM 'struct comm_event' to perf/event.h libperf: Add PERF_RECORD_NAMESPACES 'struct namespaces_event' to perf/event.h libperf: Add PERF_RECORD_FORK 'struct fork_event' to perf/event.h libperf: Add PERF_RECORD_LOST 'struct lost_event' to perf/event.h libperf: Add PERF_RECORD_LOST_SAMPLES 'struct lost_samples_event' to perf/event.h libperf: Add PERF_RECORD_READ 'struct read_event' to perf/event.h libperf: Add PERF_RECORD_THROTTLE 'struct throttle_event' to perf/event.h libperf: Add PERF_RECORD_KSYMBOL 'struct ksymbol_event' to perf/event.h libperf: Add PERF_RECORD_BPF_EVENT 'struct bpf_event' to perf/event.h libperf: Add PERF_RECORD_SAMPLE 'struct sample_event' to perf/event.h Souptick Joarder (1): perf tools: Remove duplicate headers tools/perf/arch/arm/util/cs-etm.c | 2 +- tools/perf/arch/arm64/util/arm-spe.c | 1 + tools/perf/arch/s390/util/auxtrace.c | 1 + tools/perf/arch/x86/tests/perf-time-to-tsc.c | 2 + tools/perf/arch/x86/util/intel-bts.c | 1 + tools/perf/arch/x86/util/intel-pt.c | 3 +- tools/perf/builtin-c2c.c | 1 + tools/perf/builtin-diff.c | 2 + tools/perf/builtin-record.c | 4 +- tools/perf/builtin-report.c | 1 + tools/perf/builtin-sched.c | 2 +- tools/perf/builtin-script.c | 7 +- tools/perf/builtin-stat.c | 2 +- tools/perf/builtin-trace.c | 1 + tools/perf/examples/bpf/augmented_raw_syscalls.c | 100 +++++++-------- tools/perf/lib/include/perf/event.h | 112 ++++++++++++++++ tools/perf/perf.h | 62 --------- tools/perf/tests/backward-ring-buffer.c | 2 +- tools/perf/tests/bpf.c | 1 + tools/perf/tests/code-reading.c | 1 + tools/perf/tests/keep-tracking.c | 1 + tools/perf/tests/openat-syscall-tp-fields.c | 3 +- tools/perf/tests/parse-no-sample-id-all.c | 4 +- tools/perf/tests/perf-record.c | 2 +- tools/perf/tests/shell/record+zstd_comp_decomp.sh | 2 +- tools/perf/tests/switch-tracking.c | 1 + tools/perf/tests/task-exit.c | 1 + tools/perf/trace/beauty/ioctl.c | 2 +- tools/perf/ui/browsers/res_sample.c | 2 + tools/perf/ui/browsers/scripts.c | 8 +- tools/perf/ui/stdio/hist.c | 1 + tools/perf/util/Build | 1 + tools/perf/util/annotate.c | 2 + tools/perf/util/auxtrace.c | 2 +- tools/perf/util/bpf-event.c | 36 +++--- tools/perf/util/bpf-event.h | 10 +- tools/perf/util/cacheline.c | 26 ++++ tools/perf/util/cacheline.h | 21 +++ tools/perf/util/callchain.c | 1 + tools/perf/util/cpumap.h | 2 - tools/perf/util/data.c | 1 - tools/perf/util/event.c | 35 +++-- tools/perf/util/event.h | 149 +++++----------------- tools/perf/util/evlist.c | 2 +- tools/perf/util/evsel.c | 22 ++-- tools/perf/util/evsel.h | 4 +- tools/perf/util/get_current_dir_name.c | 1 - tools/perf/util/hist.c | 5 +- tools/perf/util/intel-bts.c | 2 +- tools/perf/util/kvm-stat.h | 2 +- tools/perf/util/machine.c | 25 ++-- tools/perf/util/machine.h | 1 + tools/perf/util/namespaces.c | 2 +- tools/perf/util/namespaces.h | 4 +- tools/perf/util/python.c | 58 ++++----- tools/perf/util/record.c | 1 + tools/perf/util/record.h | 74 +++++++++++ tools/perf/util/session.c | 16 +-- tools/perf/util/sort.c | 12 +- tools/perf/util/sort.h | 27 +--- tools/perf/util/stat-display.c | 1 - tools/perf/util/stat.c | 1 + tools/perf/util/stat.h | 7 +- tools/perf/util/thread.c | 4 +- tools/perf/util/thread.h | 4 +- tools/perf/util/tool.h | 2 +- tools/perf/util/top.h | 1 + tools/perf/util/util.c | 20 --- tools/perf/util/util.h | 1 - 69 files changed, 493 insertions(+), 427 deletions(-) create mode 100644 tools/perf/lib/include/perf/event.h create mode 100644 tools/perf/util/cacheline.c create mode 100644 tools/perf/util/cacheline.h create mode 100644 tools/perf/util/record.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0-rc6.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.2.1 20190816 gcc-9-branch@274554, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 8.3.0-19) 8.3.0, clang version 7.0.1-9 (tags/RELEASE_701/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190805 [gcc-9-branch revision 274114], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 78 ubuntu:19.10 : Ok gcc (Ubuntu 9.1.0-9ubuntu2) 9.1.0, clang version 8.0.1-+rc4-1 (tags/RELEASE_801/rc4) # # uname -a Linux quaco 5.2.6-200.fc30.x86_64 #1 SMP Mon Aug 5 13:20:47 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 74a1e863eb73 perf evsel: Rename perf_missing_features::bpf_event to ::bpf # perf version --build-options perf version 5.3.rc6.g74a1e863eb73 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_perf_o_O: make perf.o make_no_backtrace_O: make NO_BACKTRACE=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_util_map_o_O: make util/map.o make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_gtk2_O: make NO_GTK2=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_libperl_O: make NO_LIBPERL=1 make_install_bin_O: make install-bin make_tags_O: make tags make_install_prefix_O: make install prefix=/tmp/krava make_with_babeltrace_O: make LIBBABELTRACE=1 make_doc_O: make doc make_cscope_O: make cscope make_pure_O: make make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libbpf_O: make NO_LIBBPF=1 make_help_O: make help make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_slang_O: make NO_SLANG=1 make_clean_all_O: make clean all make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_debug_O: make DEBUG=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_newt_O: make NO_NEWT=1 make_static_O: make LDFLAGS=-static make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libelf_O: make NO_LIBELF=1 make_install_O: make install make_no_libunwind_O: make NO_LIBUNWIND=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-08-27 1:36 Arnaldo Carvalho de Melo @ 2019-08-27 8:24 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-08-27 8:24 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Andi Kleen, Benjamin Peterson, Gustavo A . R . Silva, James Clark, Souptick Joarder, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 39152ee51b77851689f9b23fde6f610d13566c39: > > perf/x86/intel/pt: Get rid of reverse lookup table for ToPA (2019-08-26 12:00:16 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190826 > > for you to fetch changes up to 74a1e863eb73dcc9f069b671dfb40650f3832116: > > perf evsel: Rename perf_missing_features::bpf_event to ::bpf (2019-08-26 19:39:11 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf report: > > Andi Kleen: > > - Make --ns time sort key output column wide enough for nanoseconds. > > perf script: > > Gustavo A. R. Silva: > > - Fix memory leaks in list_scripts() > > perf tests: > > James Clark: > > - Fixes hang in zstd compression test by changing the source of random data. > > perf trace: > > Arnaldo Carvalho de Melo: > > - augmented_raw_syscalls.c BPF helper improvements. > > Benjamin Peterson: > > - Fix off-by-one error in ioctl cmd->string table. > > libperf: > > Jiri Olsa: > > - Move most PERF_RECORD_ structs to perf/event.h. > > headers: > > Arnaldo Carvalho de Melo: > > - Move cacheline related routines to separate source files. > > - Move record_opts and other record declarations to separate files. > > - Explicitly add some more needed headers here and there. > > Souptick Joarder: > > - Remove some duplicate include directives. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Andi Kleen (2): > perf report: Use timestamp__scnprintf_nsec() for time sort key > perf report: Fix --ns time sort key output > > Arnaldo Carvalho de Melo (15): > perf cpumap: No need to include perf.h, ditch it > perf stat: Remove needless headers from stat.h > perf record: Move record_opts and other record decls out of perf.h > perf cacheline: Move cacheline related routines to separate files > perf srcline: Add missing srcline.h header to files needing its defs > perf sort: Remove needless headers from sort.h, provide fwd struct decls > perf augmented_raw_syscalls: Rename augmented_filename to augmented_arg > perf augmented_raw_syscalls: Postpone tmp map lookup to after pid_filter > perf augmented_raw_syscalls: Introduce helper to get the scratch space > perf augmented_raw_syscalls: Reduce perf_event_output() boilerplate > libperf: Rename the PERF_RECORD_ structs to have a "perf" suffix > perf tools: Rename perf_event::ksymbol_event to perf_event::ksymbol > perf tools: Rename perf_event::bpf_event to perf_event::bpf > perf tool: Rename perf_tool::bpf_event to bpf > perf evsel: Rename perf_missing_features::bpf_event to ::bpf > > Benjamin Peterson (1): > perf trace beauty ioctl: Fix off-by-one error in cmd->string table > > Gustavo A. R. Silva (1): > perf script: Fix memory leaks in list_scripts() > > James Clark (1): > perf tests: Fixes hang in zstd compression test by changing the source of random data > > Jiri Olsa (12): > libperf: Add PERF_RECORD_MMAP 'struct mmap_event' to perf/event.h > libperf: Add PERF_RECORD_MMAP2 'struct mmap2_event' to perf/event.h > libperf: Add PERF_RECORD_COMM 'struct comm_event' to perf/event.h > libperf: Add PERF_RECORD_NAMESPACES 'struct namespaces_event' to perf/event.h > libperf: Add PERF_RECORD_FORK 'struct fork_event' to perf/event.h > libperf: Add PERF_RECORD_LOST 'struct lost_event' to perf/event.h > libperf: Add PERF_RECORD_LOST_SAMPLES 'struct lost_samples_event' to perf/event.h > libperf: Add PERF_RECORD_READ 'struct read_event' to perf/event.h > libperf: Add PERF_RECORD_THROTTLE 'struct throttle_event' to perf/event.h > libperf: Add PERF_RECORD_KSYMBOL 'struct ksymbol_event' to perf/event.h > libperf: Add PERF_RECORD_BPF_EVENT 'struct bpf_event' to perf/event.h > libperf: Add PERF_RECORD_SAMPLE 'struct sample_event' to perf/event.h > > Souptick Joarder (1): > perf tools: Remove duplicate headers > > tools/perf/arch/arm/util/cs-etm.c | 2 +- > tools/perf/arch/arm64/util/arm-spe.c | 1 + > tools/perf/arch/s390/util/auxtrace.c | 1 + > tools/perf/arch/x86/tests/perf-time-to-tsc.c | 2 + > tools/perf/arch/x86/util/intel-bts.c | 1 + > tools/perf/arch/x86/util/intel-pt.c | 3 +- > tools/perf/builtin-c2c.c | 1 + > tools/perf/builtin-diff.c | 2 + > tools/perf/builtin-record.c | 4 +- > tools/perf/builtin-report.c | 1 + > tools/perf/builtin-sched.c | 2 +- > tools/perf/builtin-script.c | 7 +- > tools/perf/builtin-stat.c | 2 +- > tools/perf/builtin-trace.c | 1 + > tools/perf/examples/bpf/augmented_raw_syscalls.c | 100 +++++++-------- > tools/perf/lib/include/perf/event.h | 112 ++++++++++++++++ > tools/perf/perf.h | 62 --------- > tools/perf/tests/backward-ring-buffer.c | 2 +- > tools/perf/tests/bpf.c | 1 + > tools/perf/tests/code-reading.c | 1 + > tools/perf/tests/keep-tracking.c | 1 + > tools/perf/tests/openat-syscall-tp-fields.c | 3 +- > tools/perf/tests/parse-no-sample-id-all.c | 4 +- > tools/perf/tests/perf-record.c | 2 +- > tools/perf/tests/shell/record+zstd_comp_decomp.sh | 2 +- > tools/perf/tests/switch-tracking.c | 1 + > tools/perf/tests/task-exit.c | 1 + > tools/perf/trace/beauty/ioctl.c | 2 +- > tools/perf/ui/browsers/res_sample.c | 2 + > tools/perf/ui/browsers/scripts.c | 8 +- > tools/perf/ui/stdio/hist.c | 1 + > tools/perf/util/Build | 1 + > tools/perf/util/annotate.c | 2 + > tools/perf/util/auxtrace.c | 2 +- > tools/perf/util/bpf-event.c | 36 +++--- > tools/perf/util/bpf-event.h | 10 +- > tools/perf/util/cacheline.c | 26 ++++ > tools/perf/util/cacheline.h | 21 +++ > tools/perf/util/callchain.c | 1 + > tools/perf/util/cpumap.h | 2 - > tools/perf/util/data.c | 1 - > tools/perf/util/event.c | 35 +++-- > tools/perf/util/event.h | 149 +++++----------------- > tools/perf/util/evlist.c | 2 +- > tools/perf/util/evsel.c | 22 ++-- > tools/perf/util/evsel.h | 4 +- > tools/perf/util/get_current_dir_name.c | 1 - > tools/perf/util/hist.c | 5 +- > tools/perf/util/intel-bts.c | 2 +- > tools/perf/util/kvm-stat.h | 2 +- > tools/perf/util/machine.c | 25 ++-- > tools/perf/util/machine.h | 1 + > tools/perf/util/namespaces.c | 2 +- > tools/perf/util/namespaces.h | 4 +- > tools/perf/util/python.c | 58 ++++----- > tools/perf/util/record.c | 1 + > tools/perf/util/record.h | 74 +++++++++++ > tools/perf/util/session.c | 16 +-- > tools/perf/util/sort.c | 12 +- > tools/perf/util/sort.h | 27 +--- > tools/perf/util/stat-display.c | 1 - > tools/perf/util/stat.c | 1 + > tools/perf/util/stat.h | 7 +- > tools/perf/util/thread.c | 4 +- > tools/perf/util/thread.h | 4 +- > tools/perf/util/tool.h | 2 +- > tools/perf/util/top.h | 1 + > tools/perf/util/util.c | 20 --- > tools/perf/util/util.h | 1 - > 69 files changed, 493 insertions(+), 427 deletions(-) > create mode 100644 tools/perf/lib/include/perf/event.h > create mode 100644 tools/perf/util/cacheline.c > create mode 100644 tools/perf/util/cacheline.h > create mode 100644 tools/perf/util/record.h Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-08-22 21:00 Arnaldo Carvalho de Melo 2019-08-23 10:30 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-08-22 21:00 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Gerald Baeza, Nageswara R Sastry, Ravi Bangoria, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 4e92b18e5b0b61211f4511cdbc5803300eeead40: Merge tag 'perf-core-for-mingo-5.4-20190820' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-08-20 21:38:22 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190822 for you to fetch changes up to d9c5c083416500e95da098c01be092b937def7fa: libperf: Fix alignment trap with xyarray contents in 'perf stat' (2019-08-22 17:16:57 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf c2c: Ravi Bangoria: - Fix report with offline cpus. libperf: Gerald BAEZA: - Fix alignment trap with xyarray contents in 'perf stat', noticed on ARMv7. Jiri Olsa: - Move some more cpu_map and thread_map methods from tools/perf/util/ to libperf. headers: Arnaldo Carvalho de Melo: - Do some house cleaning on the headers, removing needless includes in some places, providing forward declarations when those are the only thing needed, and fixing up the fallout from that for cases where we were using stuff and not adding the necessary headers. Should speed up the build and avoid needless rebuilds when something unrelated gets touched. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Arnaldo Carvalho de Melo (18): perf arm64: Add missing debug.h header perf kvm s390: Add missing string.h header perf metricgroup: Remove needless includes from metricgroup.h perf evsel: Move xyarray.h from evsel.c to evsel.h to reduce include dep tree perf counts: Add missing headers needed for types used perf bpf: Add missing xyarray.h header perf evlist: Add missing xyarray.h header perf script: Add missing counts.h perf tests: Add missing counts.h perf stat: Add missing counts.h perf scripting python: Add missing counts.h header perf evsel: Add missing perf/evsel.h header in util/evsel.h perf evsel: Remove needless counts.h header from util/evsel.h perf evsel: Remove needless stddef.h from util/evsel.h perf evsel: util/evsel.h needs stdio.h as it uses FILE perf x86 kvm-stat: Add missing string.h header perf evsel: Switch to libperf's cpumap.h perf cpumap: Remove needless includes from cpumap.h Gerald BAEZA (1): libperf: Fix alignment trap with xyarray contents in 'perf stat' Jiri Olsa (5): tools headers: Add missing perf_event.h include perf tools: Use perf_cpu_map__nr instead of cpu_map__nr libperf: Move perf's cpu_map__empty() to perf_cpu_map__empty() libperf: Move perf's cpu_map__idx() to perf_cpu_map__idx() libperf: Add perf_thread_map__nr/perf_thread_map__pid functions Ravi Bangoria (1): perf c2c: Fix report with offline cpus tools/include/linux/ring_buffer.h | 1 + tools/perf/arch/arm/util/cs-etm.c | 12 ++++---- tools/perf/arch/arm64/util/header.c | 1 + tools/perf/arch/s390/util/kvm-stat.c | 1 + tools/perf/arch/x86/util/header.c | 1 + tools/perf/arch/x86/util/intel-bts.c | 4 +-- tools/perf/arch/x86/util/intel-pt.c | 10 +++---- tools/perf/arch/x86/util/kvm-stat.c | 1 + tools/perf/builtin-c2c.c | 4 +-- tools/perf/builtin-ftrace.c | 2 +- tools/perf/builtin-script.c | 5 ++-- tools/perf/builtin-stat.c | 8 +++--- tools/perf/builtin-trace.c | 4 +-- tools/perf/lib/cpumap.c | 17 ++++++++++++ tools/perf/lib/include/internal/cpumap.h | 2 ++ tools/perf/lib/include/internal/xyarray.h | 3 +- tools/perf/lib/include/perf/cpumap.h | 2 ++ tools/perf/lib/include/perf/threadmap.h | 2 ++ tools/perf/lib/libperf.map | 3 ++ tools/perf/lib/threadmap.c | 10 +++++++ tools/perf/tests/mem2node.c | 1 + tools/perf/tests/openat-syscall-all-cpus.c | 1 + tools/perf/tests/openat-syscall.c | 1 + tools/perf/tests/thread-map.c | 6 ++-- tools/perf/util/auxtrace.c | 4 +-- tools/perf/util/bpf-loader.c | 2 ++ tools/perf/util/counts.h | 4 +++ tools/perf/util/cpumap.c | 22 ++++----------- tools/perf/util/cpumap.h | 17 ++---------- tools/perf/util/cputopo.c | 2 ++ tools/perf/util/env.c | 1 + tools/perf/util/event.c | 10 +++---- tools/perf/util/evlist.c | 32 ++++++++++++---------- tools/perf/util/evsel.c | 6 ++-- tools/perf/util/evsel.h | 12 +++++--- tools/perf/util/mem2node.c | 1 + tools/perf/util/metricgroup.c | 3 +- tools/perf/util/metricgroup.h | 13 +++++---- tools/perf/util/mmap.c | 2 +- tools/perf/util/pmu.c | 1 + tools/perf/util/record.c | 2 +- .../util/scripting-engines/trace-event-python.c | 3 +- tools/perf/util/stat-display.c | 7 +++-- tools/perf/util/stat.c | 7 +++-- tools/perf/util/svghelper.c | 1 + tools/perf/util/thread_map.c | 4 +-- tools/perf/util/thread_map.h | 10 ------- 47 files changed, 155 insertions(+), 113 deletions(-) Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0-rc5.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.1.1 20190808 gcc-9-branch@274204, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 8.3.0-19) 8.3.0, clang version 7.0.1-9 (tags/RELEASE_701/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190805 [gcc-9-branch revision 274114], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.10 : Ok gcc (Ubuntu 9.1.0-9ubuntu2) 9.1.0, clang version 8.0.1-+rc4-1 (tags/RELEASE_801/rc4) # uname -a Linux quaco 5.2.6-200.fc30.x86_64 #1 SMP Mon Aug 5 13:20:47 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 d9c5c0834165 libperf: Fix alignment trap with xyarray contents in 'perf stat' # perf version --build-options perf version 5.3.rc5.gd9c5c0834165 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_debug_O: make DEBUG=1 make_doc_O: make doc make_no_newt_O: make NO_NEWT=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_pure_O: make make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libunwind_O: make NO_LIBUNWIND=1 make_tags_O: make tags make_with_clangllvm_O: make LIBCLANGLLVM=1 make_static_O: make LDFLAGS=-static make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_slang_O: make NO_SLANG=1 make_help_O: make help make_no_libbpf_O: make NO_LIBBPF=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_clean_all_O: make clean all make_no_libelf_O: make NO_LIBELF=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_install_O: make install make_install_prefix_O: make install prefix=/tmp/krava make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_gtk2_O: make NO_GTK2=1 make_no_demangle_O: make NO_DEMANGLE=1 make_cscope_O: make cscope make_with_babeltrace_O: make LIBBABELTRACE=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_install_bin_O: make install-bin make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_perf_o_O: make perf.o make_util_map_o_O: make util/map.o make_no_libnuma_O: make NO_LIBNUMA=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libperl_O: make NO_LIBPERL=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-08-22 21:00 Arnaldo Carvalho de Melo @ 2019-08-23 10:30 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-08-23 10:30 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Gerald Baeza, Nageswara R Sastry, Ravi Bangoria, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 4e92b18e5b0b61211f4511cdbc5803300eeead40: > > Merge tag 'perf-core-for-mingo-5.4-20190820' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-08-20 21:38:22 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190822 > > for you to fetch changes up to d9c5c083416500e95da098c01be092b937def7fa: > > libperf: Fix alignment trap with xyarray contents in 'perf stat' (2019-08-22 17:16:57 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf c2c: > > Ravi Bangoria: > > - Fix report with offline cpus. > > libperf: > > Gerald BAEZA: > > - Fix alignment trap with xyarray contents in 'perf stat', noticed on ARMv7. > > Jiri Olsa: > > - Move some more cpu_map and thread_map methods from tools/perf/util/ to libperf. > > headers: > > Arnaldo Carvalho de Melo: > > - Do some house cleaning on the headers, removing needless includes in some places, > providing forward declarations when those are the only thing needed, and fixing > up the fallout from that for cases where we were using stuff and not adding the > necessary headers. Should speed up the build and avoid needless rebuilds when > something unrelated gets touched. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Arnaldo Carvalho de Melo (18): > perf arm64: Add missing debug.h header > perf kvm s390: Add missing string.h header > perf metricgroup: Remove needless includes from metricgroup.h > perf evsel: Move xyarray.h from evsel.c to evsel.h to reduce include dep tree > perf counts: Add missing headers needed for types used > perf bpf: Add missing xyarray.h header > perf evlist: Add missing xyarray.h header > perf script: Add missing counts.h > perf tests: Add missing counts.h > perf stat: Add missing counts.h > perf scripting python: Add missing counts.h header > perf evsel: Add missing perf/evsel.h header in util/evsel.h > perf evsel: Remove needless counts.h header from util/evsel.h > perf evsel: Remove needless stddef.h from util/evsel.h > perf evsel: util/evsel.h needs stdio.h as it uses FILE > perf x86 kvm-stat: Add missing string.h header > perf evsel: Switch to libperf's cpumap.h > perf cpumap: Remove needless includes from cpumap.h > > Gerald BAEZA (1): > libperf: Fix alignment trap with xyarray contents in 'perf stat' > > Jiri Olsa (5): > tools headers: Add missing perf_event.h include > perf tools: Use perf_cpu_map__nr instead of cpu_map__nr > libperf: Move perf's cpu_map__empty() to perf_cpu_map__empty() > libperf: Move perf's cpu_map__idx() to perf_cpu_map__idx() > libperf: Add perf_thread_map__nr/perf_thread_map__pid functions > > Ravi Bangoria (1): > perf c2c: Fix report with offline cpus > > tools/include/linux/ring_buffer.h | 1 + > tools/perf/arch/arm/util/cs-etm.c | 12 ++++---- > tools/perf/arch/arm64/util/header.c | 1 + > tools/perf/arch/s390/util/kvm-stat.c | 1 + > tools/perf/arch/x86/util/header.c | 1 + > tools/perf/arch/x86/util/intel-bts.c | 4 +-- > tools/perf/arch/x86/util/intel-pt.c | 10 +++---- > tools/perf/arch/x86/util/kvm-stat.c | 1 + > tools/perf/builtin-c2c.c | 4 +-- > tools/perf/builtin-ftrace.c | 2 +- > tools/perf/builtin-script.c | 5 ++-- > tools/perf/builtin-stat.c | 8 +++--- > tools/perf/builtin-trace.c | 4 +-- > tools/perf/lib/cpumap.c | 17 ++++++++++++ > tools/perf/lib/include/internal/cpumap.h | 2 ++ > tools/perf/lib/include/internal/xyarray.h | 3 +- > tools/perf/lib/include/perf/cpumap.h | 2 ++ > tools/perf/lib/include/perf/threadmap.h | 2 ++ > tools/perf/lib/libperf.map | 3 ++ > tools/perf/lib/threadmap.c | 10 +++++++ > tools/perf/tests/mem2node.c | 1 + > tools/perf/tests/openat-syscall-all-cpus.c | 1 + > tools/perf/tests/openat-syscall.c | 1 + > tools/perf/tests/thread-map.c | 6 ++-- > tools/perf/util/auxtrace.c | 4 +-- > tools/perf/util/bpf-loader.c | 2 ++ > tools/perf/util/counts.h | 4 +++ > tools/perf/util/cpumap.c | 22 ++++----------- > tools/perf/util/cpumap.h | 17 ++---------- > tools/perf/util/cputopo.c | 2 ++ > tools/perf/util/env.c | 1 + > tools/perf/util/event.c | 10 +++---- > tools/perf/util/evlist.c | 32 ++++++++++++---------- > tools/perf/util/evsel.c | 6 ++-- > tools/perf/util/evsel.h | 12 +++++--- > tools/perf/util/mem2node.c | 1 + > tools/perf/util/metricgroup.c | 3 +- > tools/perf/util/metricgroup.h | 13 +++++---- > tools/perf/util/mmap.c | 2 +- > tools/perf/util/pmu.c | 1 + > tools/perf/util/record.c | 2 +- > .../util/scripting-engines/trace-event-python.c | 3 +- > tools/perf/util/stat-display.c | 7 +++-- > tools/perf/util/stat.c | 7 +++-- > tools/perf/util/svghelper.c | 1 + > tools/perf/util/thread_map.c | 4 +-- > tools/perf/util/thread_map.h | 10 ------- > 47 files changed, 155 insertions(+), 113 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-08-20 19:27 Arnaldo Carvalho de Melo 2019-08-20 19:39 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-08-20 19:27 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexey Budankov, Guenter Roeck, Leo Yan, Mathieu Poirier, Steven Rostedt, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo Hi Ingo/Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit cfb104ca8a26affb28d81720a4ed49c30b2a3b01: Merge tag 'perf-core-for-mingo-5.4-20190816' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-08-16 22:43:42 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190820 for you to fetch changes up to b81d39c7a1efb83caa3f4419939a46e96191abb6: libperf: Fix arch include paths (2019-08-20 12:29:36 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: callchains: Alexey Budankov: - Allow collecting LBR together with DWARF callchains, for workloads where the userspace stack size collected is not big enough for pure DWARF based unwinding. - Dump the LBR call stack in 'perf report -D'. perf top: Arnaldo Carvalho de Melo: - Show visual cue at start to state that the minimal set of samples are being collected prior to sorting/bucketizing/displaying. CoreSight (ARM hardware tracing): Leo Yan: - Support sample flags 'insn' and 'insnlen'. core: Adrian Hunter: - Add comment for 'idx' member in 'struct perf_sample_id. tools headers: Arnaldo Carvalho de Melo: - Synchronize linux/bits.h, which required grabbing a copy of the kernel const.h headers and some changes in the ordering of header directories. - Sync x86's asm/cpufeatures.h with the with the kernel, no change in any of the tools. libperf: Jiri Olsa: - Fix arch include paths. libtraceevent: Steven Rostedt (VMware): - Fix "robust" test of do_generate_dynamic_list_file. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (1): perf evsel: Add comment for 'idx' member in 'struct perf_sample_id Alexey Budankov (3): perf record: Enable LBR callstack capture jointly with thread stack perf report: Dump LBR callstack data by -D jointly with thread stack perf report: Prefer DWARF callstacks to LBR ones when captured both Arnaldo Carvalho de Melo (10): tools headers: Add limits.h to access __WORDSIZE perf tools: tools/include should come before tools/uapi/include tools headers: Grab copy of linux/const.h, needed by linux/bits.h tools headers: Synchronize linux/bits.h with the kernel sources tools arch x86: Sync asm/cpufeatures.h with the with the kernel perf ui: Make 'exit_msg' optional in ui__question_window() perf ui: Introduce non-interactive ui__info_window() function perf ui browser: Allow specifying message to show when no samples are available to display perf top: Show info message while collecting samples tools headers: Fixup bitsperlong per arch includes Jiri Olsa (1): libperf: Fix arch include paths Leo Yan (1): perf cs-etm: Support sample flags 'insn' and 'insnlen' Steven Rostedt (VMware) (1): tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file tools/arch/x86/include/asm/cpufeatures.h | 3 +++ tools/include/linux/bitops.h | 1 + tools/include/linux/bits.h | 17 +++++++++------ tools/include/linux/const.h | 9 ++++++++ tools/include/uapi/asm/bitsperlong.h | 18 ++++++++-------- tools/include/uapi/linux/const.h | 31 ++++++++++++++++++++++++++ tools/lib/traceevent/Makefile | 4 ++-- tools/perf/Makefile.config | 2 +- tools/perf/builtin-report.c | 2 ++ tools/perf/check-headers.sh | 2 ++ tools/perf/lib/Makefile | 2 +- tools/perf/ui/browser.c | 2 ++ tools/perf/ui/browser.h | 1 + tools/perf/ui/browsers/hists.c | 3 +++ tools/perf/ui/tui/util.c | 37 ++++++++++++++++++++++---------- tools/perf/ui/util.h | 2 ++ tools/perf/util/cs-etm.c | 35 +++++++++++++++++++++++++++++- tools/perf/util/evsel.h | 7 ++++++ tools/perf/util/parse-branch-options.c | 1 + tools/perf/util/session.c | 31 +++++++++++++++----------- 20 files changed, 166 insertions(+), 44 deletions(-) create mode 100644 tools/include/linux/const.h create mode 100644 tools/include/uapi/linux/const.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0-rc4.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.1.1 20190808 gcc-9-branch@274204, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 8.3.0-19) 8.3.0, clang version 7.0.1-9 (tags/RELEASE_701/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/fi 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190723 [gcc-9-branch revision 273734], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.10 : Ok gcc (Ubuntu 9.1.0-9ubuntu2) 9.1.0, clang version 8.0.1-+rc4-1 (tags/RELEASE_801/rc4) # # uname -a Linux quaco 5.2.6-200.fc30.x86_64 #1 SMP Mon Aug 5 13:20:47 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 b81d39c7a1ef libperf: Fix arch include paths # perf version --build-options perf version 5.3.rc4.gb81d39c7a1ef dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory `/home/acme/git/linux/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_libelf_O: make NO_LIBELF=1 make_util_map_o_O: make util/map.o make_debug_O: make DEBUG=1 make_no_libperl_O: make NO_LIBPERL=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_backtrace_O: make NO_BACKTRACE=1 make_install_O: make install make_perf_o_O: make perf.o make_pure_O: make make_no_gtk2_O: make NO_GTK2=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_doc_O: make doc make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_help_O: make help make_install_bin_O: make install-bin make_no_demangle_O: make NO_DEMANGLE=1 make_no_newt_O: make NO_NEWT=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_install_prefix_O: make install prefix=/tmp/krava make_static_O: make LDFLAGS=-static make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_slang_O: make NO_SLANG=1 make_cscope_O: make cscope make_with_babeltrace_O: make LIBBABELTRACE=1 make_clean_all_O: make clean all make_no_libbpf_O: make NO_LIBBPF=1 make_tags_O: make tags make_no_libpython_O: make NO_LIBPYTHON=1 OK make: Leaving directory `/home/acme/git/linux/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-08-20 19:27 Arnaldo Carvalho de Melo @ 2019-08-20 19:39 ` Ingo Molnar 2019-08-20 19:44 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 133+ messages in thread From: Ingo Molnar @ 2019-08-20 19:39 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexey Budankov, Guenter Roeck, Leo Yan, Mathieu Poirier, Steven Rostedt, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo/Thomas, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit cfb104ca8a26affb28d81720a4ed49c30b2a3b01: > > Merge tag 'perf-core-for-mingo-5.4-20190816' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-08-16 22:43:42 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190820 > > for you to fetch changes up to b81d39c7a1efb83caa3f4419939a46e96191abb6: > > libperf: Fix arch include paths (2019-08-20 12:29:36 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > callchains: > > Alexey Budankov: > > - Allow collecting LBR together with DWARF callchains, for workloads > where the userspace stack size collected is not big enough for > pure DWARF based unwinding. > > - Dump the LBR call stack in 'perf report -D'. > > perf top: > > Arnaldo Carvalho de Melo: > > - Show visual cue at start to state that the minimal set of samples > are being collected prior to sorting/bucketizing/displaying. > > CoreSight (ARM hardware tracing): > > Leo Yan: > > - Support sample flags 'insn' and 'insnlen'. > > core: > > Adrian Hunter: > > - Add comment for 'idx' member in 'struct perf_sample_id. > > tools headers: > > Arnaldo Carvalho de Melo: > > - Synchronize linux/bits.h, which required grabbing a copy of the kernel > const.h headers and some changes in the ordering of header directories. > > - Sync x86's asm/cpufeatures.h with the with the kernel, no change in > any of the tools. > > libperf: > > Jiri Olsa: > > - Fix arch include paths. > > libtraceevent: > > Steven Rostedt (VMware): > > - Fix "robust" test of do_generate_dynamic_list_file. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Adrian Hunter (1): > perf evsel: Add comment for 'idx' member in 'struct perf_sample_id > > Alexey Budankov (3): > perf record: Enable LBR callstack capture jointly with thread stack > perf report: Dump LBR callstack data by -D jointly with thread stack > perf report: Prefer DWARF callstacks to LBR ones when captured both > > Arnaldo Carvalho de Melo (10): > tools headers: Add limits.h to access __WORDSIZE > perf tools: tools/include should come before tools/uapi/include > tools headers: Grab copy of linux/const.h, needed by linux/bits.h > tools headers: Synchronize linux/bits.h with the kernel sources > tools arch x86: Sync asm/cpufeatures.h with the with the kernel > perf ui: Make 'exit_msg' optional in ui__question_window() > perf ui: Introduce non-interactive ui__info_window() function > perf ui browser: Allow specifying message to show when no samples are available to display > perf top: Show info message while collecting samples > tools headers: Fixup bitsperlong per arch includes > > Jiri Olsa (1): > libperf: Fix arch include paths > > Leo Yan (1): > perf cs-etm: Support sample flags 'insn' and 'insnlen' > > Steven Rostedt (VMware) (1): > tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file > > tools/arch/x86/include/asm/cpufeatures.h | 3 +++ > tools/include/linux/bitops.h | 1 + > tools/include/linux/bits.h | 17 +++++++++------ > tools/include/linux/const.h | 9 ++++++++ > tools/include/uapi/asm/bitsperlong.h | 18 ++++++++-------- > tools/include/uapi/linux/const.h | 31 ++++++++++++++++++++++++++ > tools/lib/traceevent/Makefile | 4 ++-- > tools/perf/Makefile.config | 2 +- > tools/perf/builtin-report.c | 2 ++ > tools/perf/check-headers.sh | 2 ++ > tools/perf/lib/Makefile | 2 +- > tools/perf/ui/browser.c | 2 ++ > tools/perf/ui/browser.h | 1 + > tools/perf/ui/browsers/hists.c | 3 +++ > tools/perf/ui/tui/util.c | 37 ++++++++++++++++++++++---------- > tools/perf/ui/util.h | 2 ++ > tools/perf/util/cs-etm.c | 35 +++++++++++++++++++++++++++++- > tools/perf/util/evsel.h | 7 ++++++ > tools/perf/util/parse-branch-options.c | 1 + > tools/perf/util/session.c | 31 +++++++++++++++----------- > 20 files changed, 166 insertions(+), 44 deletions(-) > create mode 100644 tools/include/linux/const.h > create mode 100644 tools/include/uapi/linux/const.h Pulled, thanks a lot Arnaldo! This one's very nice: > Arnaldo Carvalho de Melo (10): > perf top: Show info message while collecting samples :-) Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-08-20 19:39 ` Ingo Molnar @ 2019-08-20 19:44 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-08-20 19:44 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexey Budankov, Guenter Roeck, Leo Yan, Mathieu Poirier, Steven Rostedt, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo Em Tue, Aug 20, 2019 at 09:39:53PM +0200, Ingo Molnar escreveu: > Pulled, thanks a lot Arnaldo! Wow, that was fast, thanks! > This one's very nice: > > > Arnaldo Carvalho de Melo (10): > > perf top: Show info message while collecting samples > > :-) Yeah, we need to polish these kind of little details, pressing 'C' and getting callchains enabled/disabled would be nice as well in 'perf top', just thought about that :-) - Arnaldo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-08-16 20:16 Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-08-16 20:16 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Florian Weimer, William Cohen, Haiyan Song, John Keeping, Arnaldo Carvalho de Melo Hi Ingo, Thomas, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 4511708b9a044f2bc83c7c7f7f8a2c45ec488219: Merge tag 'perf-core-for-mingo-5.4-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-08-15 11:10:38 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190816 for you to fetch changes up to e2736219e6ca3117e10651e215b96d66775220da: perf unwind: Remove unnecessary test (2019-08-16 12:30:14 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: report/script/trace/top: Arnaldo Carvalho de Melo: - Allow specifying marker events demarcating when to consider the other events, i.e. one now can state something like: # perf probe kernel_function # perf record -e cycles,probe:kernel_function And then, in 'perf script' or 'perf report' say: # perf report --switch-on=probe:kernel_function And then the cycles event samples will be considered only after we find the first probe:kernel_function event. There is also --switch-off=event, to make it stop considering events out of some window, say to avoid some winding down of a workload. The same can be done with the "live mode" tools: 'perf top' and 'perf trace'. There are examples in the cset comments showing how to use it with SDT events in things like 'systemtap', that have those tracepoint-like events for the start/end of passes, etc. Another example involves selecting scheduler events + entry/exit of a syscall, using the syscalls tracepoints, one can then see the scheduler events that take place while that syscall is being processed. In the future this should be possible in record/top/trace via eBPF where the perf tools would hook into the marker events and enable events put in place but not enabled when the on/off conditions are the desired ones, reducing the amount of events sampled, but this userspace only solution should be good enough for many scenarios. perf vendor events intel: Haiyan Song: - Add Tremontx event file v1.02. unwind: John Keeping: - Fix callchain unwinding when tid != pid, that was working only for the thread group leader. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Arnaldo Carvalho de Melo (13): perf script: Allow specifying event to switch on processing of other events perf script: Allow showing the --switch-on event perf script: Allow specifying event to switch off processing of other events perf evswitch: Move struct to a separate header to use in other tools perf evswitch: Move switch logic to use in other tools perf evswitch: Add the names of on/off events perf evswitch: Introduce OPTS_EVSWITCH() for cmd line processing perf evswitch: Introduce init() method to set the on/off evsels from the command line perf evswitch: Move enoent error message printing to separate function perf evswitch: Add hint when not finding specified on/off events perf trace: Add --switch-on/--switch-off events perf top: Add --switch-on/--switch-off events perf report: Add --switch-on/--switch-off events Haiyan Song (1): perf vendor events intel: Add Tremontx event file v1.02 John Keeping (3): perf map: Use zalloc for map_groups perf unwind: Fix libunwind when tid != pid perf unwind: Remove unnecessary test tools/perf/Documentation/perf-report.txt | 17 + tools/perf/Documentation/perf-script.txt | 9 + tools/perf/Documentation/perf-top.txt | 38 ++ tools/perf/Documentation/perf-trace.txt | 9 + tools/perf/builtin-report.c | 10 + tools/perf/builtin-script.c | 10 + tools/perf/builtin-top.c | 10 +- tools/perf/builtin-trace.c | 10 + tools/perf/pmu-events/arch/x86/mapfile.csv | 1 + tools/perf/pmu-events/arch/x86/tremontx/cache.json | 111 ++++++ .../pmu-events/arch/x86/tremontx/frontend.json | 26 ++ .../perf/pmu-events/arch/x86/tremontx/memory.json | 26 ++ tools/perf/pmu-events/arch/x86/tremontx/other.json | 26 ++ .../pmu-events/arch/x86/tremontx/pipeline.json | 111 ++++++ .../arch/x86/tremontx/uncore-memory.json | 73 ++++ .../pmu-events/arch/x86/tremontx/uncore-other.json | 431 +++++++++++++++++++++ .../pmu-events/arch/x86/tremontx/uncore-power.json | 11 + .../arch/x86/tremontx/virtual-memory.json | 86 ++++ tools/perf/util/Build | 1 + tools/perf/util/evswitch.c | 61 +++ tools/perf/util/evswitch.h | 31 ++ tools/perf/util/map.c | 5 +- tools/perf/util/map_groups.h | 4 + tools/perf/util/thread.c | 7 +- tools/perf/util/thread.h | 4 - tools/perf/util/top.h | 2 + tools/perf/util/unwind-libunwind-local.c | 18 +- tools/perf/util/unwind-libunwind.c | 40 +- tools/perf/util/unwind.h | 25 +- 29 files changed, 1158 insertions(+), 55 deletions(-) create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/cache.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/frontend.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/memory.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/other.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/pipeline.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-memory.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-other.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-power.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/virtual-memory.json create mode 100644 tools/perf/util/evswitch.c create mode 100644 tools/perf/util/evswitch.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0-rc4.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.1.1 20190808 gcc-9-branch@274204, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 8.3.0-19) 8.3.0, clang version 7.0.1-9 (tags/RELEASE_701/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190723 [gcc-9-branch revision 273734], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.10 : Ok gcc (Ubuntu 9.1.0-9ubuntu2) 9.1.0, clang version 8.0.1-+rc4-1 (tags/RELEASE_801/rc4) # uname -a Linux quaco 5.2.6-200.fc30.x86_64 #1 SMP Mon Aug 5 13:20:47 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 e2736219e6ca perf unwind: Remove unnecessary test # perf version --build-options perf version 5.3.rc4.ge2736219e6ca dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ time make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_gtk2_O: make NO_GTK2=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_cscope_O: make cscope make_debug_O: make DEBUG=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_newt_O: make NO_NEWT=1 make_no_libbpf_O: make NO_LIBBPF=1 make_util_map_o_O: make util/map.o make_install_prefix_O: make install prefix=/tmp/krava make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_clean_all_O: make clean all make_help_O: make help make_no_libpython_O: make NO_LIBPYTHON=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_install_bin_O: make install-bin make_no_demangle_O: make NO_DEMANGLE=1 make_no_libperl_O: make NO_LIBPERL=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_static_O: make LDFLAGS=-static make_no_libelf_O: make NO_LIBELF=1 make_no_slang_O: make NO_SLANG=1 make_tags_O: make tags make_no_libunwind_O: make NO_LIBUNWIND=1 make_doc_O: make doc make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_with_babeltrace_O: make LIBBABELTRACE=1 make_perf_o_O: make perf.o make_install_O: make install make_pure_O: make OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-08-14 18:40 Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-08-14 18:40 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexander Shishkin, Andy Shevchenko, Haiyan Song, Igor Lubashev, Leo Yan, Luke Mujica, Tan Xiaojun, Vince Weaver, Arnaldo Carvalho de Melo Hi, Please consider pulling, this has v5.3-rc4 merged in to pick up libbpf fixes, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 272172bd418cc32aa466588150c8001bc229c712: Merge remote-tracking branch 'torvalds/master' into perf/core (2019-08-12 16:25:00 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.4-20190814 for you to fetch changes up to 1cd8fa288eb83c1fe0dfa492b09d228a8d802fbf: perf ui: No need to set ui_browser to 1 twice (2019-08-14 11:00:00 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: Intel PT: Adrian Hunter: - Add PEBS via Intel PT support, the kernel bits went via PeterZ. perf record: Alexander Shishkin: - Add an option to take an AUX snapshot on exit. Tan Xiaojun: - Support aarch64 random socket_id assignment, just like was fixed for S/390. tools: Andy Shevchenko: - Keep list of tools in alphabetical order on 'make -C tools help'. perf session: Arnaldo Carvalho de Melo: - Avoid infinite loop when seeing invalid header.size, reported by Vince Weaver using a perf.data fuzzer. Documentation: Vince Weaver: - Clarify HEADER_SAMPLE_TOPOLOGY format in the perf.data spec. perf config: Arnaldo Carvalho de Melo: - Honour $PERF_CONFIG env var to specify alternate .perfconfig. perf test: Arnaldo Carvalho de Melo: - Disable ~/.perfconfig to get default output in 'perf trace' tests. perf top: Arnaldo Carvalho de Melo: - Set display thread COMM to help with debugging. - Collapse and resort evsels in a group, so that we have output similar to 'perf report' when using event groups, i.e. perf top -e '{cycles,instructions}' Will have two columns, and the instructions one will work. core: Igor Lubashev: - Detect if libcap development files are available so that we can use capabilities to match the checks made by the kernel instead of using plain (geteuid() == 0). Intel: Haiyan Song: - Add Icelake V1.00 event file. perf trace: Leo Yan: - Fix segmentation fault when access syscall info on arm64. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (5): perf tools: Add aux_output attribute flag perf tools: Add itrace option 'o' to synthesize aux-output events perf intel-pt: Process options for PEBS event synthesis perf tools: Add aux-output config term perf intel-pt: Add brief documentation for PEBS via Intel PT Alexander Shishkin (1): perf record: Add an option to take an AUX snapshot on exit Andy Shevchenko (1): tools: Keep list of tools in alphabetical order Arnaldo Carvalho de Melo (13): perf session: Avoid infinite loop when seeing invalid header.size perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig perf config: Document the PERF_CONFIG environment variable perf test vfs_getname: Disable ~/.perfconfig to get default output perf top: Set display thread COMM to help with debugging perf hists: Do not link a pair if already linked perf hist: Remove dummy entries when finding real ones. perf top: Collapse and resort all evsels in a group perf tools: Add NO_LIBCAP=1 to the minimal build test perf tools: Add CAP_SYSLOG define for older systems perf ftrace: Improve error message about capability to use ftrace perf evsel: Provide meaningful warning when trying to use 'aux_output' on older kernels perf ui: No need to set ui_browser to 1 twice Haiyan Song (1): perf vendor events intel: Add Icelake V1.00 event file Igor Lubashev (3): tools build: Add capability-related feature detection perf tools: Add helpers to use capabilities if present perf ftrace: Use CAP_SYS_ADMIN instead of euid==0 Leo Yan (1): perf trace: Fix segmentation fault when access syscall info on arm64 Luke Mujica (1): perf tools: Fix paths in include statements Tan Xiaojun (1): perf record: Support aarch64 random socket_id assignment Vince Weaver (1): perf.data documentation: Clarify HEADER_SAMPLE_TOPOLOGY format tools/Makefile | 4 +- tools/build/Makefile.feature | 2 + tools/build/feature/Makefile | 4 + tools/build/feature/test-libcap.c | 20 + tools/include/uapi/linux/perf_event.h | 3 +- tools/perf/Documentation/intel-pt.txt | 15 + tools/perf/Documentation/itrace.txt | 2 + tools/perf/Documentation/perf-config.txt | 4 + tools/perf/Documentation/perf-record.txt | 13 +- tools/perf/Documentation/perf.data-file-format.txt | 25 +- tools/perf/Makefile.config | 11 + tools/perf/Makefile.perf | 2 + tools/perf/arch/x86/util/intel-pt.c | 23 + tools/perf/arch/x86/util/kvm-stat.c | 4 +- tools/perf/arch/x86/util/tsc.c | 6 +- tools/perf/builtin-ftrace.c | 12 +- tools/perf/builtin-record.c | 35 +- tools/perf/builtin-top.c | 34 +- tools/perf/builtin-trace.c | 2 +- tools/perf/perf.c | 3 + tools/perf/perf.h | 1 + tools/perf/pmu-events/arch/x86/icelake/cache.json | 552 +++++++++++++ .../arch/x86/icelake/floating-point.json | 102 +++ .../perf/pmu-events/arch/x86/icelake/frontend.json | 424 ++++++++++ tools/perf/pmu-events/arch/x86/icelake/memory.json | 410 ++++++++++ tools/perf/pmu-events/arch/x86/icelake/other.json | 121 +++ .../perf/pmu-events/arch/x86/icelake/pipeline.json | 892 +++++++++++++++++++++ .../arch/x86/icelake/virtual-memory.json | 236 ++++++ tools/perf/pmu-events/arch/x86/mapfile.csv | 2 + tools/perf/tests/make | 1 + tools/perf/tests/shell/trace+probe_vfs_getname.sh | 4 + tools/perf/ui/helpline.c | 4 +- tools/perf/ui/setup.c | 2 +- tools/perf/ui/util.c | 2 +- tools/perf/util/Build | 2 + tools/perf/util/auxtrace.c | 18 +- tools/perf/util/auxtrace.h | 5 +- tools/perf/util/cap.c | 29 + tools/perf/util/cap.h | 32 + tools/perf/util/event.h | 1 + tools/perf/util/evsel.c | 15 +- tools/perf/util/evsel.h | 3 + tools/perf/util/header.c | 4 +- tools/perf/util/hist.c | 20 +- tools/perf/util/intel-pt.c | 18 + tools/perf/util/parse-events.c | 8 + tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.l | 1 + tools/perf/util/python-ext-sources | 1 + tools/perf/util/session.c | 11 +- tools/perf/util/setup.py | 2 + tools/perf/util/util.c | 9 + 52 files changed, 3112 insertions(+), 45 deletions(-) create mode 100644 tools/build/feature/test-libcap.c create mode 100644 tools/perf/pmu-events/arch/x86/icelake/cache.json create mode 100644 tools/perf/pmu-events/arch/x86/icelake/floating-point.json create mode 100644 tools/perf/pmu-events/arch/x86/icelake/frontend.json create mode 100644 tools/perf/pmu-events/arch/x86/icelake/memory.json create mode 100644 tools/perf/pmu-events/arch/x86/icelake/other.json create mode 100644 tools/perf/pmu-events/arch/x86/icelake/pipeline.json create mode 100644 tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json create mode 100644 tools/perf/util/cap.c create mode 100644 tools/perf/util/cap.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Clearlinux is failing when building with libpython, but that is not a perf regression, will try to remove one compiler warning that is causing the problem when building some of the glue code files in the python files, outside perf. # export PERF_TARBALL=http://192.168.124.1/perf/perf-5.3.0-rc4.tar.xz # dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.1 (tags/RELEASE_801/final) (based on LLVM 8.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.1.1 20190808 gcc-9-branch@274204, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 8.3.0-19) 8.3.0, clang version 7.0.1-9 (tags/RELEASE_701/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31.1) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.1 (tags/RELEASE_801/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190723 [gcc-9-branch revision 273734], clang version 8.0.1 (tags/RELEASE_801/final 366581) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 oraclelinux:8 : Ok gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3.0.1), clang version 7.0.1 (tags/RELEASE_701/final) 52 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 53 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 54 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 55 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 61 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 62 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 72 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 73 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 74 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 77 ubuntu:19.10 : Ok gcc (Ubuntu 9.1.0-9ubuntu2) 9.1.0, clang version 8.0.1-+rc4-1 (tags/RELEASE_801/rc4) # uname -a Linux quaco 5.2.6-200.fc30.x86_64 #1 SMP Mon Aug 5 13:20:47 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 1cd8fa288eb8 perf ui: No need to set ui_browser to 1 twice # perf version --build-options perf version 5.3.rc4.g1cd8fa288eb8 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_clean_all_O: make clean all make_no_backtrace_O: make NO_BACKTRACE=1 make_tags_O: make tags make_install_O: make install make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_static_O: make LDFLAGS=-static make_doc_O: make doc make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_cscope_O: make cscope make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_demangle_O: make NO_DEMANGLE=1 make_help_O: make help make_no_libelf_O: make NO_LIBELF=1 make_util_map_o_O: make util/map.o make_pure_O: make make_with_clangllvm_O: make LIBCLANGLLVM=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_install_bin_O: make install-bin make_install_prefix_O: make install prefix=/tmp/krava make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_debug_O: make DEBUG=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_perf_o_O: make perf.o make_no_newt_O: make NO_NEWT=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_gtk2_O: make NO_GTK2=1 make_no_libbpf_O: make NO_LIBBPF=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libperl_O: make NO_LIBPERL=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_slang_O: make NO_SLANG=1 make_no_libunwind_O: make NO_LIBUNWIND=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-07-22 17:38 Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-07-22 17:38 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Alexey Budankov, Andi Kleen, Cong Wang, Denis Bakhvalov, Numfor Mbiziwo-Tiapo Hi Ingo, Please consider pulling, Best regards, - Arnaldo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-07-15 21:11 Arnaldo Carvalho de Melo 0 siblings, 0 replies; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-07-15 21:11 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Mamatha Inamdar, Ravi Bangoria, Thomas Richter, YueHaibing, Arnaldo Carvalho de Melo Hi Ingo, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 323fd749821daab0f327ec86d707c4542963cdb0: perf intel-pt: Fix potential NULL pointer dereference found by the smatch tool (2019-07-09 10:13:28 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190715 for you to fetch changes up to 916c31fff946fae0e05862f9b2435fdb29fd5090: perf version: Fix segfault due to missing OPT_END() (2019-07-15 07:59:05 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf db-export: Adrian Hunter: - Improvements in how COMM details are exported to databases for post processing and use in the sql-viewer.py UI. - Export switch events to the database. BPF: Arnaldo Carvalho de Melo: - Bump rlimit(MEMLOCK) for 'perf test bpf' and 'perf trace', just like selftests/bpf/bpf_rlimit.h do, which makes errors due to exhaustion of this limit, which are kinda cryptic (EPERM sometimes) less frequent. perf version: Ravi Bangoria: - Fix segfault due to missing OPT_END(), noticed on PowerPC. perf vendor events: Thomas Richter: - Add JSON files for IBM s/390 machine type 8561. perf cs-etm (ARM): YueHaibing: - Fix two cases of error returns not bing done properly: Invalid ERR_PTR() use and loss of propagation error codes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (21): perf db-export: Get rid of db_export__deferred() perf db-export: Rename db_export__comm() to db_export__exec_comm() perf db-export: Pass main_thread to db_export__thread() perf db-export: Export main_thread in db_export__sample() perf db-export: Export comm before exporting thread perf db-export: Move export__comm_thread into db_export__sample() perf db-export: Fix a white space issue in db_export__sample() perf db-export: Export comm details perf scripts python: export-to-sqlite.py: Export comm details perf scripts python: export-to-postgresql.py: Export comm details perf db-export: Factor out db_export__comm() perf db-export: Also export thread's current comm perf scripts python: export-to-sqlite.py: Add has_calls column to comms table perf scripts python: export-to-postgresql.py: Add has_calls column to comms table perf scripts python: exported-sql-viewer.py: Remove redundant semi-colons perf scripts python: exported-sql-viewer.py: Use new 'has_calls' column perf script: Add scripting operation process_switch() perf db-export: Factor out db_export__threads() perf db-export: Export switch events perf scripts python: export-to-sqlite.py: Export switch events perf scripts python: export-to-postgresql.py: Export switch events Arnaldo Carvalho de Melo (3): perf tools: Introduce rlimit__bump_memlock() helper perf test: Auto bump rlimit(MEMLOCK) for BPF test sake perf trace: Auto bump rlimit(MEMLOCK) for eBPF maps sake Ravi Bangoria (1): perf version: Fix segfault due to missing OPT_END() Thomas Richter (1): perf vendor events s390: Add JSON files for machine type 8561 YueHaibing (2): perf cs-etm: Remove errnoeous ERR_PTR() usage in cs_etm__process_auxtrace_info perf cs-etm: Return errcode in cs_etm__process_auxtrace_info() tools/perf/builtin-script.c | 8 +- tools/perf/builtin-trace.c | 10 + tools/perf/builtin-version.c | 1 + .../perf/pmu-events/arch/s390/cf_m8561/basic.json | 58 ++++ .../perf/pmu-events/arch/s390/cf_m8561/crypto.json | 114 +++++++ .../pmu-events/arch/s390/cf_m8561/crypto6.json | 30 ++ .../pmu-events/arch/s390/cf_m8561/extended.json | 373 +++++++++++++++++++++ tools/perf/pmu-events/arch/s390/mapfile.csv | 1 + tools/perf/scripts/python/export-to-postgresql.py | 68 +++- tools/perf/scripts/python/export-to-sqlite.py | 54 ++- tools/perf/scripts/python/exported-sql-viewer.py | 34 +- tools/perf/tests/builtin-test.c | 6 + tools/perf/util/Build | 1 + tools/perf/util/cs-etm.c | 12 +- tools/perf/util/db-export.c | 291 ++++++++++------ tools/perf/util/db-export.h | 19 +- tools/perf/util/rlimit.c | 29 ++ tools/perf/util/rlimit.h | 6 + .../util/scripting-engines/trace-event-python.c | 53 ++- tools/perf/util/trace-event.h | 3 + 20 files changed, 1029 insertions(+), 142 deletions(-) create mode 100644 tools/perf/pmu-events/arch/s390/cf_m8561/basic.json create mode 100644 tools/perf/pmu-events/arch/s390/cf_m8561/crypto.json create mode 100644 tools/perf/pmu-events/arch/s390/cf_m8561/crypto6.json create mode 100644 tools/perf/pmu-events/arch/s390/cf_m8561/extended.json create mode 100644 tools/perf/util/rlimit.c create mode 100644 tools/perf/util/rlimit.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. The 'perf test bpf' test is about rlimit(MEMLOCK), bump it a to 128K from the default 64K and it'll work. Next pull req will have auto-adjustment for 'perf test' and 'perf trace', where BPF programs creating maps are also failing. $ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0.tar.xz $ dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.1.1 20190628 gcc-9-branch@272773, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental : Ok gcc (Debian 8.3.0-7) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 21 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 22 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 23 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 24 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 25 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 26 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 27 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 28 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 29 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 30 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 31 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 32 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 33 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 34 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 35 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 36 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 38 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 39 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 40 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 41 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 42 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 43 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 44 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.0 (tags/RELEASE_800/final) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190611 [gcc-9-branch revision 272147], clang version 8.0.0 (tags/RELEASE_800/final 356365) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 52 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 53 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 54 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 55 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 61 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 62 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 72 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 73 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 74 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.10 : Ok gcc (Ubuntu 8.3.0-14ubuntu1) 8.3.0, clang version 8.0.1-+rc1-1~exp1 (tags/RELEASE_801/rc1) $ # uname -a Linux quaco 5.2.0-rc7+ #4 SMP Sat Jul 6 14:43:41 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 916c31fff946 perf version: Fix segfault due to missing OPT_END() # perf version --build-options perf version 5.2.g916c31fff946 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . - /home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP: make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_gtk2_O: make NO_GTK2=1 make_install_prefix_O: make install prefix=/tmp/krava make_install_bin_O: make install-bin make_clean_all_O: make clean all make_doc_O: make doc make_install_O: make install make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libelf_O: make NO_LIBELF=1 make_static_O: make LDFLAGS=-static make_pure_O: make make_no_libbpf_O: make NO_LIBBPF=1 make_help_O: make help make_no_slang_O: make NO_SLANG=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_tags_O: make tags make_no_libunwind_O: make NO_LIBUNWIND=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_newt_O: make NO_NEWT=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_cscope_O: make cscope make_perf_o_O: make perf.o make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_libpython_O: make NO_LIBPYTHON=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_libperl_O: make NO_LIBPERL=1 make_debug_O: make DEBUG=1 make_util_map_o_O: make util/map.o make_util_pmu_bison_o_O: make util/pmu-bison.o OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-07-09 18:31 Arnaldo Carvalho de Melo 2019-07-13 9:13 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-07-09 18:31 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, David Carrillo Cisneros, Leo Yan, Luke Mujica, Numfor Mbiziwo-Tiapo, Song Liu, Arnaldo Carvalho de Melo Hi Ingo, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit d1d59b817939821bee149e870ce7723f61ffb512: Merge tag 'perf-urgent-for-mingo-5.3-20190708-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-07-09 13:22:03 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190709 for you to fetch changes up to 323fd749821daab0f327ec86d707c4542963cdb0: perf intel-pt: Fix potential NULL pointer dereference found by the smatch tool (2019-07-09 10:13:28 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: Intel PT: Adrian Hunter: - Fix DROP VIEW power_events_view in the postgresql and sqlite export-db python scripts. perf script: Song Liu: - Assume native_arch for pipe mode, fixing a segfault. perf inject: Arnaldo Carvalho de Melo: - The tool->read() call may pass a NULL evsel, handle it. core: Arnaldo Carvalho de Melo: - Move zalloc/zfree.c to tools/lib, further eroding tools/perf/util.[ch] - Use zfree() where applicable instead of open coded equivalent. - Add stdlib.h and some other headers to places where its needed and were getting via util.h, that doesn't need that anymore. - Use list_del_init() more thoroughly. Miscellaneous: Leo Yan: - Fix use after free and potential NULL pointer derefs detected by the smatch tool in various places. Luke Mujica: - Remove a couple unused variables in the parse-events code. Numfor Mbiziwo-Tiapo: - Initialize variable to suppress memory sanitizer warning in the mmap-thread-lookup 'perf test' entry. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (2): perf scripts python: export-to-postgresql.py: Fix DROP VIEW power_events_view perf scripts python: export-to-sqlite.py: Fix DROP VIEW power_events_view Arnaldo Carvalho de Melo (9): perf inject: The tool->read() call may pass a NULL evsel, handle it perf evsel: perf_evsel__name(NULL) is valid, no need to check evsel perf tools: Add missing headers, mostly stdlib.h perf namespaces: Move the conditional setns() prototype to namespaces.h perf tools: Move get_current_dir_name() cond prototype out of util.h tools lib: Adopt zalloc()/zfree() from tools/perf perf tools: Use zfree() where applicable perf tools: Use list_del_init() more thorougly perf metricgroup: Add missing list_del_init() when flushing egroups list Leo Yan (10): perf stat: Fix use-after-freed pointer detected by the smatch tool perf top: Fix potential NULL pointer dereference detected by the smatch tool perf annotate: Fix dereferencing freed memory found by the smatch tool perf trace: Fix potential NULL pointer dereference found by the smatch tool perf map: Fix potential NULL pointer dereference found by smatch tool perf session: Fix potential NULL pointer dereference found by the smatch tool perf cs-etm: Fix potential NULL pointer dereference found by the smatch tool perf hists browser: Fix potential NULL pointer dereference found by the smatch tool perf intel-bts: Fix potential NULL pointer dereference found by the smatch tool perf intel-pt: Fix potential NULL pointer dereference found by the smatch tool Luke Mujica (2): perf parse-events: Remove unused variable 'i' perf parse-events: Remove unused variable: error Numfor Mbiziwo-Tiapo (1): perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning Song Liu (1): perf script: Assume native_arch for pipe mode tools/include/linux/zalloc.h | 12 +++++ tools/lib/zalloc.c | 15 ++++++ tools/perf/MANIFEST | 1 + tools/perf/arch/arm/annotate/instructions.c | 1 + tools/perf/arch/arm/util/auxtrace.c | 1 + tools/perf/arch/arm/util/cs-etm.c | 1 + tools/perf/arch/arm64/util/arm-spe.c | 1 + tools/perf/arch/common.c | 3 +- tools/perf/arch/powerpc/util/perf_regs.c | 4 +- tools/perf/arch/s390/util/auxtrace.c | 1 + tools/perf/arch/s390/util/header.c | 3 +- tools/perf/arch/x86/util/event.c | 2 +- tools/perf/arch/x86/util/intel-bts.c | 2 +- tools/perf/arch/x86/util/intel-pt.c | 2 +- tools/perf/arch/x86/util/perf_regs.c | 2 +- tools/perf/bench/futex-hash.c | 3 +- tools/perf/bench/futex-lock-pi.c | 3 +- tools/perf/bench/mem-functions.c | 2 +- tools/perf/bench/numa.c | 2 +- tools/perf/builtin-annotate.c | 2 +- tools/perf/builtin-bench.c | 2 +- tools/perf/builtin-c2c.c | 2 +- tools/perf/builtin-config.c | 1 + tools/perf/builtin-diff.c | 2 +- tools/perf/builtin-ftrace.c | 2 +- tools/perf/builtin-help.c | 2 + tools/perf/builtin-inject.c | 2 +- tools/perf/builtin-kmem.c | 2 +- tools/perf/builtin-kvm.c | 2 +- tools/perf/builtin-lock.c | 10 ++-- tools/perf/builtin-probe.c | 2 +- tools/perf/builtin-record.c | 4 +- tools/perf/builtin-report.c | 4 +- tools/perf/builtin-sched.c | 2 +- tools/perf/builtin-script.c | 5 +- tools/perf/builtin-stat.c | 8 ++-- tools/perf/builtin-timechart.c | 4 +- tools/perf/builtin-top.c | 8 +++- tools/perf/builtin-trace.c | 7 +-- tools/perf/perf.c | 2 +- tools/perf/pmu-events/jevents.c | 2 +- tools/perf/scripts/python/export-to-postgresql.py | 2 +- tools/perf/scripts/python/export-to-sqlite.py | 2 +- tools/perf/tests/dwarf-unwind.c | 5 +- tools/perf/tests/expr.c | 3 +- tools/perf/tests/llvm.c | 1 + tools/perf/tests/mem2node.c | 3 +- tools/perf/tests/mmap-thread-lookup.c | 2 +- tools/perf/tests/sample-parsing.c | 1 + tools/perf/tests/switch-tracking.c | 3 +- tools/perf/tests/thread-map.c | 3 +- tools/perf/tests/vmlinux-kallsyms.c | 1 + tools/perf/ui/browser.c | 2 +- tools/perf/ui/browser.h | 1 + tools/perf/ui/browsers/annotate.c | 2 +- tools/perf/ui/browsers/hists.c | 17 +++++-- tools/perf/ui/browsers/map.c | 1 + tools/perf/ui/browsers/res_sample.c | 6 +-- tools/perf/ui/browsers/scripts.c | 4 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/ui/gtk/util.c | 3 +- tools/perf/ui/stdio/hist.c | 2 +- tools/perf/ui/tui/setup.c | 1 + tools/perf/ui/tui/util.c | 2 +- tools/perf/util/Build | 5 ++ tools/perf/util/annotate.c | 13 ++--- tools/perf/util/arm-spe.c | 2 +- tools/perf/util/auxtrace.c | 11 ++--- tools/perf/util/bpf-loader.c | 3 +- tools/perf/util/build-id.c | 1 + tools/perf/util/call-path.c | 5 +- tools/perf/util/callchain.c | 12 ++--- tools/perf/util/cgroup.c | 4 +- tools/perf/util/comm.c | 2 +- tools/perf/util/config.c | 3 +- tools/perf/util/counts.c | 2 +- tools/perf/util/cpumap.c | 2 +- tools/perf/util/cputopo.c | 5 +- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 + tools/perf/util/cs-etm.c | 8 ++-- tools/perf/util/data-convert-bt.c | 4 +- tools/perf/util/data.c | 3 +- tools/perf/util/db-export.c | 7 +-- tools/perf/util/debug.c | 1 + tools/perf/util/demangle-java.c | 3 +- tools/perf/util/dso.c | 5 +- tools/perf/util/dwarf-aux.c | 2 +- tools/perf/util/env.c | 11 +++-- tools/perf/util/event.c | 3 +- tools/perf/util/evlist.c | 2 +- tools/perf/util/evsel.c | 4 +- tools/perf/util/get_current_dir_name.c | 6 +-- tools/perf/util/get_current_dir_name.h | 8 ++++ tools/perf/util/header.c | 8 ++-- tools/perf/util/help-unknown-cmd.c | 2 + tools/perf/util/hist.c | 20 ++++---- tools/perf/util/intel-bts.c | 7 ++- .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 2 +- tools/perf/util/intel-pt.c | 15 +++--- tools/perf/util/jitdump.c | 7 ++- tools/perf/util/llvm-utils.c | 4 +- tools/perf/util/machine.c | 6 +-- tools/perf/util/map.c | 9 ++-- tools/perf/util/mem2node.c | 2 +- tools/perf/util/metricgroup.c | 10 ++-- tools/perf/util/mmap.c | 1 + tools/perf/util/namespaces.c | 3 +- tools/perf/util/namespaces.h | 4 ++ tools/perf/util/ordered-events.c | 6 +-- tools/perf/util/parse-branch-options.c | 2 +- tools/perf/util/parse-events.c | 3 +- tools/perf/util/parse-events.y | 2 - tools/perf/util/parse-regs-options.c | 8 +++- tools/perf/util/pmu.c | 4 +- tools/perf/util/probe-event.c | 55 ++++++++++------------ tools/perf/util/probe-file.c | 2 +- tools/perf/util/probe-finder.c | 2 +- tools/perf/util/pstack.c | 2 +- tools/perf/util/python-ext-sources | 1 + tools/perf/util/s390-cpumsf.c | 11 ++--- tools/perf/util/session.c | 7 ++- tools/perf/util/setns.c | 4 +- tools/perf/util/srccode.c | 11 +++-- tools/perf/util/srcline.c | 2 +- tools/perf/util/stat-shadow.c | 3 +- tools/perf/util/stat.c | 3 +- tools/perf/util/strbuf.c | 3 +- tools/perf/util/strfilter.c | 3 +- tools/perf/util/strlist.c | 2 +- tools/perf/util/svghelper.c | 2 +- tools/perf/util/symbol-elf.c | 18 +++---- tools/perf/util/symbol-minimal.c | 3 +- tools/perf/util/symbol.c | 1 + tools/perf/util/syscalltbl.c | 2 +- tools/perf/util/target.c | 2 +- tools/perf/util/thread-stack.c | 3 +- tools/perf/util/thread.c | 6 +-- tools/perf/util/thread_map.c | 4 +- tools/perf/util/trace-event-info.c | 1 + tools/perf/util/trace-event-scripting.c | 2 +- tools/perf/util/unwind-libdw.c | 1 + tools/perf/util/unwind-libunwind-local.c | 3 +- tools/perf/util/usage.c | 3 ++ tools/perf/util/util.h | 17 ------- tools/perf/util/values.c | 2 +- tools/perf/util/vdso.c | 1 + tools/perf/util/xyarray.c | 2 +- 147 files changed, 375 insertions(+), 279 deletions(-) create mode 100644 tools/include/linux/zalloc.h create mode 100644 tools/lib/zalloc.c create mode 100644 tools/perf/util/get_current_dir_name.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. The 'perf test bpf' test is about rlimit(MEMLOCK), bump it a to 128K from the default 64K and it'll work. Next pull req will have auto-adjustment for 'perf test' and 'perf trace', where BPF programs creating maps are also failing. $ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0.tar.xz $ dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.1.1 20190628 gcc-9-branch@272773, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:experimental : Ok gcc (Debian 8.3.0-7) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 21 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 22 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 23 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 24 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 25 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 26 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 27 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 28 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 29 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 30 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 31 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 32 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 33 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 34 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 35 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 36 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 38 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 39 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 40 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 41 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 42 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 43 manjaro:latest : Ok gcc (GCC) 9.1.0, clang version 8.0.0 (tags/RELEASE_800/final) 44 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 45 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 46 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 47 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190611 [gcc-9-branch revision 272147], clang version 8.0.0 (tags/RELEASE_800/final 356365) 48 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 49 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 50 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 51 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 52 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 53 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 54 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 55 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 60 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 61 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 62 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 71 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 72 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 73 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 74 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.10 : Ok gcc (Ubuntu 8.3.0-14ubuntu1) 8.3.0, clang version 8.0.1-+rc1-1~exp1 (tags/RELEASE_801/rc1) $ # uname -a Linux quaco 5.2.0-rc7+ #4 SMP Sat Jul 6 14:43:41 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 323fd749821d perf intel-pt: Fix potential NULL pointer dereference found by the smatch tool # perf version --build-options perf version 5.2.g323fd749821d dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Skip 41.2: BPF pinning : Skip 41.3: BPF prologue generation : Skip 41.4: BPF relocation checker : Skip 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test | tee /wb/build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_libbpf_O: make NO_LIBBPF=1 make_static_O: make LDFLAGS=-static make_util_map_o_O: make util/map.o make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_clean_all_O: make clean all make_doc_O: make doc make_no_libperl_O: make NO_LIBPERL=1 make_install_O: make install make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_cscope_O: make cscope make_tags_O: make tags make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_slang_O: make NO_SLANG=1 make_install_bin_O: make install-bin make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_perf_o_O: make perf.o make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_gtk2_O: make NO_GTK2=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libelf_O: make NO_LIBELF=1 make_pure_O: make make_no_newt_O: make NO_NEWT=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_help_O: make help make_debug_O: make DEBUG=1 make_no_demangle_O: make NO_DEMANGLE=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-07-09 18:31 Arnaldo Carvalho de Melo @ 2019-07-13 9:13 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-07-13 9:13 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, David Carrillo Cisneros, Leo Yan, Luke Mujica, Numfor Mbiziwo-Tiapo, Song Liu, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit d1d59b817939821bee149e870ce7723f61ffb512: > > Merge tag 'perf-urgent-for-mingo-5.3-20190708-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-07-09 13:22:03 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190709 > > for you to fetch changes up to 323fd749821daab0f327ec86d707c4542963cdb0: > > perf intel-pt: Fix potential NULL pointer dereference found by the smatch tool (2019-07-09 10:13:28 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > Intel PT: > > Adrian Hunter: > > - Fix DROP VIEW power_events_view in the postgresql and sqlite export-db > python scripts. > > perf script: > > Song Liu: > > - Assume native_arch for pipe mode, fixing a segfault. > > perf inject: > > Arnaldo Carvalho de Melo: > > - The tool->read() call may pass a NULL evsel, handle it. > > core: > > Arnaldo Carvalho de Melo: > > - Move zalloc/zfree.c to tools/lib, further eroding tools/perf/util.[ch] > > - Use zfree() where applicable instead of open coded equivalent. > > - Add stdlib.h and some other headers to places where its needed and were > getting via util.h, that doesn't need that anymore. > > - Use list_del_init() more thoroughly. > > Miscellaneous: > > Leo Yan: > > - Fix use after free and potential NULL pointer derefs detected by the > smatch tool in various places. > > Luke Mujica: > > - Remove a couple unused variables in the parse-events code. > > Numfor Mbiziwo-Tiapo: > > - Initialize variable to suppress memory sanitizer warning in the > mmap-thread-lookup 'perf test' entry. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Adrian Hunter (2): > perf scripts python: export-to-postgresql.py: Fix DROP VIEW power_events_view > perf scripts python: export-to-sqlite.py: Fix DROP VIEW power_events_view > > Arnaldo Carvalho de Melo (9): > perf inject: The tool->read() call may pass a NULL evsel, handle it > perf evsel: perf_evsel__name(NULL) is valid, no need to check evsel > perf tools: Add missing headers, mostly stdlib.h > perf namespaces: Move the conditional setns() prototype to namespaces.h > perf tools: Move get_current_dir_name() cond prototype out of util.h > tools lib: Adopt zalloc()/zfree() from tools/perf > perf tools: Use zfree() where applicable > perf tools: Use list_del_init() more thorougly > perf metricgroup: Add missing list_del_init() when flushing egroups list > > Leo Yan (10): > perf stat: Fix use-after-freed pointer detected by the smatch tool > perf top: Fix potential NULL pointer dereference detected by the smatch tool > perf annotate: Fix dereferencing freed memory found by the smatch tool > perf trace: Fix potential NULL pointer dereference found by the smatch tool > perf map: Fix potential NULL pointer dereference found by smatch tool > perf session: Fix potential NULL pointer dereference found by the smatch tool > perf cs-etm: Fix potential NULL pointer dereference found by the smatch tool > perf hists browser: Fix potential NULL pointer dereference found by the smatch tool > perf intel-bts: Fix potential NULL pointer dereference found by the smatch tool > perf intel-pt: Fix potential NULL pointer dereference found by the smatch tool > > Luke Mujica (2): > perf parse-events: Remove unused variable 'i' > perf parse-events: Remove unused variable: error > > Numfor Mbiziwo-Tiapo (1): > perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning > > Song Liu (1): > perf script: Assume native_arch for pipe mode > > tools/include/linux/zalloc.h | 12 +++++ > tools/lib/zalloc.c | 15 ++++++ > tools/perf/MANIFEST | 1 + > tools/perf/arch/arm/annotate/instructions.c | 1 + > tools/perf/arch/arm/util/auxtrace.c | 1 + > tools/perf/arch/arm/util/cs-etm.c | 1 + > tools/perf/arch/arm64/util/arm-spe.c | 1 + > tools/perf/arch/common.c | 3 +- > tools/perf/arch/powerpc/util/perf_regs.c | 4 +- > tools/perf/arch/s390/util/auxtrace.c | 1 + > tools/perf/arch/s390/util/header.c | 3 +- > tools/perf/arch/x86/util/event.c | 2 +- > tools/perf/arch/x86/util/intel-bts.c | 2 +- > tools/perf/arch/x86/util/intel-pt.c | 2 +- > tools/perf/arch/x86/util/perf_regs.c | 2 +- > tools/perf/bench/futex-hash.c | 3 +- > tools/perf/bench/futex-lock-pi.c | 3 +- > tools/perf/bench/mem-functions.c | 2 +- > tools/perf/bench/numa.c | 2 +- > tools/perf/builtin-annotate.c | 2 +- > tools/perf/builtin-bench.c | 2 +- > tools/perf/builtin-c2c.c | 2 +- > tools/perf/builtin-config.c | 1 + > tools/perf/builtin-diff.c | 2 +- > tools/perf/builtin-ftrace.c | 2 +- > tools/perf/builtin-help.c | 2 + > tools/perf/builtin-inject.c | 2 +- > tools/perf/builtin-kmem.c | 2 +- > tools/perf/builtin-kvm.c | 2 +- > tools/perf/builtin-lock.c | 10 ++-- > tools/perf/builtin-probe.c | 2 +- > tools/perf/builtin-record.c | 4 +- > tools/perf/builtin-report.c | 4 +- > tools/perf/builtin-sched.c | 2 +- > tools/perf/builtin-script.c | 5 +- > tools/perf/builtin-stat.c | 8 ++-- > tools/perf/builtin-timechart.c | 4 +- > tools/perf/builtin-top.c | 8 +++- > tools/perf/builtin-trace.c | 7 +-- > tools/perf/perf.c | 2 +- > tools/perf/pmu-events/jevents.c | 2 +- > tools/perf/scripts/python/export-to-postgresql.py | 2 +- > tools/perf/scripts/python/export-to-sqlite.py | 2 +- > tools/perf/tests/dwarf-unwind.c | 5 +- > tools/perf/tests/expr.c | 3 +- > tools/perf/tests/llvm.c | 1 + > tools/perf/tests/mem2node.c | 3 +- > tools/perf/tests/mmap-thread-lookup.c | 2 +- > tools/perf/tests/sample-parsing.c | 1 + > tools/perf/tests/switch-tracking.c | 3 +- > tools/perf/tests/thread-map.c | 3 +- > tools/perf/tests/vmlinux-kallsyms.c | 1 + > tools/perf/ui/browser.c | 2 +- > tools/perf/ui/browser.h | 1 + > tools/perf/ui/browsers/annotate.c | 2 +- > tools/perf/ui/browsers/hists.c | 17 +++++-- > tools/perf/ui/browsers/map.c | 1 + > tools/perf/ui/browsers/res_sample.c | 6 +-- > tools/perf/ui/browsers/scripts.c | 4 +- > tools/perf/ui/gtk/annotate.c | 2 +- > tools/perf/ui/gtk/util.c | 3 +- > tools/perf/ui/stdio/hist.c | 2 +- > tools/perf/ui/tui/setup.c | 1 + > tools/perf/ui/tui/util.c | 2 +- > tools/perf/util/Build | 5 ++ > tools/perf/util/annotate.c | 13 ++--- > tools/perf/util/arm-spe.c | 2 +- > tools/perf/util/auxtrace.c | 11 ++--- > tools/perf/util/bpf-loader.c | 3 +- > tools/perf/util/build-id.c | 1 + > tools/perf/util/call-path.c | 5 +- > tools/perf/util/callchain.c | 12 ++--- > tools/perf/util/cgroup.c | 4 +- > tools/perf/util/comm.c | 2 +- > tools/perf/util/config.c | 3 +- > tools/perf/util/counts.c | 2 +- > tools/perf/util/cpumap.c | 2 +- > tools/perf/util/cputopo.c | 5 +- > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 + > tools/perf/util/cs-etm.c | 8 ++-- > tools/perf/util/data-convert-bt.c | 4 +- > tools/perf/util/data.c | 3 +- > tools/perf/util/db-export.c | 7 +-- > tools/perf/util/debug.c | 1 + > tools/perf/util/demangle-java.c | 3 +- > tools/perf/util/dso.c | 5 +- > tools/perf/util/dwarf-aux.c | 2 +- > tools/perf/util/env.c | 11 +++-- > tools/perf/util/event.c | 3 +- > tools/perf/util/evlist.c | 2 +- > tools/perf/util/evsel.c | 4 +- > tools/perf/util/get_current_dir_name.c | 6 +-- > tools/perf/util/get_current_dir_name.h | 8 ++++ > tools/perf/util/header.c | 8 ++-- > tools/perf/util/help-unknown-cmd.c | 2 + > tools/perf/util/hist.c | 20 ++++---- > tools/perf/util/intel-bts.c | 7 ++- > .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 2 +- > tools/perf/util/intel-pt.c | 15 +++--- > tools/perf/util/jitdump.c | 7 ++- > tools/perf/util/llvm-utils.c | 4 +- > tools/perf/util/machine.c | 6 +-- > tools/perf/util/map.c | 9 ++-- > tools/perf/util/mem2node.c | 2 +- > tools/perf/util/metricgroup.c | 10 ++-- > tools/perf/util/mmap.c | 1 + > tools/perf/util/namespaces.c | 3 +- > tools/perf/util/namespaces.h | 4 ++ > tools/perf/util/ordered-events.c | 6 +-- > tools/perf/util/parse-branch-options.c | 2 +- > tools/perf/util/parse-events.c | 3 +- > tools/perf/util/parse-events.y | 2 - > tools/perf/util/parse-regs-options.c | 8 +++- > tools/perf/util/pmu.c | 4 +- > tools/perf/util/probe-event.c | 55 ++++++++++------------ > tools/perf/util/probe-file.c | 2 +- > tools/perf/util/probe-finder.c | 2 +- > tools/perf/util/pstack.c | 2 +- > tools/perf/util/python-ext-sources | 1 + > tools/perf/util/s390-cpumsf.c | 11 ++--- > tools/perf/util/session.c | 7 ++- > tools/perf/util/setns.c | 4 +- > tools/perf/util/srccode.c | 11 +++-- > tools/perf/util/srcline.c | 2 +- > tools/perf/util/stat-shadow.c | 3 +- > tools/perf/util/stat.c | 3 +- > tools/perf/util/strbuf.c | 3 +- > tools/perf/util/strfilter.c | 3 +- > tools/perf/util/strlist.c | 2 +- > tools/perf/util/svghelper.c | 2 +- > tools/perf/util/symbol-elf.c | 18 +++---- > tools/perf/util/symbol-minimal.c | 3 +- > tools/perf/util/symbol.c | 1 + > tools/perf/util/syscalltbl.c | 2 +- > tools/perf/util/target.c | 2 +- > tools/perf/util/thread-stack.c | 3 +- > tools/perf/util/thread.c | 6 +-- > tools/perf/util/thread_map.c | 4 +- > tools/perf/util/trace-event-info.c | 1 + > tools/perf/util/trace-event-scripting.c | 2 +- > tools/perf/util/unwind-libdw.c | 1 + > tools/perf/util/unwind-libunwind-local.c | 3 +- > tools/perf/util/usage.c | 3 ++ > tools/perf/util/util.h | 17 ------- > tools/perf/util/values.c | 2 +- > tools/perf/util/vdso.c | 1 + > tools/perf/util/xyarray.c | 2 +- > 147 files changed, 375 insertions(+), 279 deletions(-) > create mode 100644 tools/include/linux/zalloc.h > create mode 100644 tools/lib/zalloc.c > create mode 100644 tools/perf/util/get_current_dir_name.h Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-07-03 3:27 Arnaldo Carvalho de Melo 2019-07-03 13:56 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-07-03 3:27 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Andi Kleen, Jin Yao, John Garry, Mariano Pache, Seeteena Thoufeek, Arnaldo Carvalho de Melo Hi Ingo, Please consider pulling, this is on top of perf-core-for-mingo-5.3-20190701. Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 06c642c0e9fceafd16b1a4c80d44b1c09e282215: perf jevents: Use nonlocal include statements in pmu-events.c (2019-07-01 22:50:42 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190703 for you to fetch changes up to 15a108af1a18b597bfbd7f7b3c7b4823bfbaf8df: perf script: Allow specifying the files to process guest samples (2019-07-03 00:13:25 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf metrics: Andi Kleen: - Fixes for SkylakeX and CascadeLakeX Intel vendor events. - Avoid extra ':' for --raw metrics. - Don't include duration_time in group. perf script: Arnaldo Carvalho de Melo/Jiri Olsa: - Fix processing guest samples. perf diff: Jin Yao: - Do diffs by basic blocks. objtool: Jiri Olsa: - Fix build by linking against tools/lib/ctype.o sources. perf pmu: John Garry: - Support more complex PMU event aliasing. - Add support for Hisi hip08 DDRC, HHA and L3C PMU aliasing. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Andi Kleen (4): perf tools: Fix typos / broken sentences perf vendor events intel: Metric fixes for SKX/CLX perf list: Avoid extra : for --raw metrics perf tools metric: Don't include duration_time in group Arnaldo Carvalho de Melo (1): perf script: Allow specifying the files to process guest samples Jin Yao (7): perf symbol: Create block_info structure perf hists: Add block_info in hist_entry perf diff: Check if all data files with branch stacks perf diff: Use hists to manage basic blocks per symbol perf diff: Link same basic blocks among different data perf diff: Print the basic block cycles diff perf diff: Documentation -c cycles option Jiri Olsa (1): objtool: Fix build by linking against tools/lib/ctype.o sources John Garry (4): perf pmu: Support more complex PMU event aliasing perf jevents: Add support for Hisi hip08 DDRC PMU aliasing perf jevents: Add support for Hisi hip08 HHA PMU aliasing perf jevents: Add support for Hisi hip08 L3C PMU aliasing tools/objtool/Build | 5 + tools/perf/Documentation/perf-diff.txt | 17 +- tools/perf/Documentation/perf-report.txt | 2 +- tools/perf/Documentation/tips.txt | 2 +- tools/perf/builtin-diff.c | 382 ++++++++++++++++++++- tools/perf/builtin-script.c | 19 + .../arch/arm64/hisilicon/hip08/uncore-ddrc.json | 44 +++ .../arch/arm64/hisilicon/hip08/uncore-hha.json | 51 +++ .../arch/arm64/hisilicon/hip08/uncore-l3c.json | 37 ++ .../arch/x86/cascadelakex/clx-metrics.json | 4 +- .../pmu-events/arch/x86/skylakex/skx-metrics.json | 22 +- tools/perf/pmu-events/jevents.c | 3 + tools/perf/ui/stdio/hist.c | 27 ++ tools/perf/util/hist.c | 41 ++- tools/perf/util/hist.h | 8 + tools/perf/util/metricgroup.c | 21 +- tools/perf/util/pmu.c | 46 ++- tools/perf/util/sort.h | 13 + tools/perf/util/srcline.c | 4 +- tools/perf/util/symbol.c | 22 ++ tools/perf/util/symbol.h | 23 ++ tools/perf/util/symbol_conf.h | 4 +- 22 files changed, 753 insertions(+), 44 deletions(-) create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-ddrc.json create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-hha.json create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-l3c.json Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Investigating the failure for ubuntu:18.04-x-arm, doesn't look like something introduced by this patchkit. ubuntu:18.04-x-arm failure not yet resolved, doesn't seem related to this patchkit nor the previous one. & export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0-rc6.tar.xz $ dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.1.1 20190628 gcc-9-branch@272773, clang version 8.0.0 (tags/RELEASE_800/final) 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:experimental : Ok gcc (Debian 8.3.0-7) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 21 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 22 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 23 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 24 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 25 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 26 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 27 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 28 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 29 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 30 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 31 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 32 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 33 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 34 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 35 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 36 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 38 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 39 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 40 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 41 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 42 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 43 manjaro:latest : Ok gcc (GCC) 8.3.0, clang version 8.0.0 (tags/RELEASE_800/final) 44 openmandriva:cooker : Ok gcc (GCC) 9.1.0 20190503 (OpenMandriva) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190520 [gcc-9-branch revision 271396], clang version 8.0.0 (tags/RELEASE_800/final 356365) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 52 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 53 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 54 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 55 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 61 ubuntu:18.04-x-arm : FAIL arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 arch/arm64/util/dwarf-regs.c: In function 'regs_query_register_offset': arch/arm64/util/dwarf-regs.c:26:43: error: dereferencing pointer to incomplete type 'struct user_pt_regs' (index * sizeof((struct user_pt_regs *)0)->regs[0]) ^ arch/arm64/util/dwarf-regs.c:91:11: note: in expansion of macro 'DWARFNUM2OFFSET' return DWARFNUM2OFFSET(roff->dwarfnum); ^~~~~~~~~~~~~~~ 62 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 71 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 72 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 73 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 74 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 76 ubuntu:19.10 : Ok gcc (Ubuntu 8.3.0-14ubuntu1) 8.3.0, clang version 8.0.1-+rc1-1~exp1 (tags/RELEASE_801/rc1) # uname -a Linux quaco 5.2.0-rc7 #2 SMP Mon Jul 1 23:05:41 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 15a108af1a18 perf script: Allow specifying the files to process guest samples # perf version --build-options perf version 5.2.rc6.g15a108af1a18 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_pure_O: make make_install_bin_O: make install-bin make_no_gtk2_O: make NO_GTK2=1 make_no_newt_O: make NO_NEWT=1 make_perf_o_O: make perf.o make_debug_O: make DEBUG=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_doc_O: make doc make_install_prefix_O: make install prefix=/tmp/krava make_no_demangle_O: make NO_DEMANGLE=1 make_util_map_o_O: make util/map.o make_no_libnuma_O: make NO_LIBNUMA=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_libperl_O: make NO_LIBPERL=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_clean_all_O: make clean all make_no_backtrace_O: make NO_BACKTRACE=1 make_cscope_O: make cscope make_no_libaudit_O: make NO_LIBAUDIT=1 make_static_O: make LDFLAGS=-static make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_help_O: make help make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 make_no_libbpf_O: make NO_LIBBPF=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_install_O: make install make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_libelf_O: make NO_LIBELF=1 make_no_slang_O: make NO_SLANG=1 make_tags_O: make tags make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_auxtrace_O: make NO_AUXTRACE=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-07-03 3:27 Arnaldo Carvalho de Melo @ 2019-07-03 13:56 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-07-03 13:56 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Andi Kleen, Jin Yao, John Garry, Mariano Pache, Seeteena Thoufeek, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, this is on top of perf-core-for-mingo-5.3-20190701. > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 06c642c0e9fceafd16b1a4c80d44b1c09e282215: > > perf jevents: Use nonlocal include statements in pmu-events.c (2019-07-01 22:50:42 -0300) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190703 > > for you to fetch changes up to 15a108af1a18b597bfbd7f7b3c7b4823bfbaf8df: > > perf script: Allow specifying the files to process guest samples (2019-07-03 00:13:25 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf metrics: > > Andi Kleen: > > - Fixes for SkylakeX and CascadeLakeX Intel vendor events. > > - Avoid extra ':' for --raw metrics. > > - Don't include duration_time in group. > > perf script: > > Arnaldo Carvalho de Melo/Jiri Olsa: > > - Fix processing guest samples. > > perf diff: > > Jin Yao: > > - Do diffs by basic blocks. > > objtool: > > Jiri Olsa: > > - Fix build by linking against tools/lib/ctype.o sources. > > perf pmu: > > John Garry: > > - Support more complex PMU event aliasing. > > - Add support for Hisi hip08 DDRC, HHA and L3C PMU aliasing. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Andi Kleen (4): > perf tools: Fix typos / broken sentences > perf vendor events intel: Metric fixes for SKX/CLX > perf list: Avoid extra : for --raw metrics > perf tools metric: Don't include duration_time in group > > Arnaldo Carvalho de Melo (1): > perf script: Allow specifying the files to process guest samples > > Jin Yao (7): > perf symbol: Create block_info structure > perf hists: Add block_info in hist_entry > perf diff: Check if all data files with branch stacks > perf diff: Use hists to manage basic blocks per symbol > perf diff: Link same basic blocks among different data > perf diff: Print the basic block cycles diff > perf diff: Documentation -c cycles option > > Jiri Olsa (1): > objtool: Fix build by linking against tools/lib/ctype.o sources > > John Garry (4): > perf pmu: Support more complex PMU event aliasing > perf jevents: Add support for Hisi hip08 DDRC PMU aliasing > perf jevents: Add support for Hisi hip08 HHA PMU aliasing > perf jevents: Add support for Hisi hip08 L3C PMU aliasing > > tools/objtool/Build | 5 + > tools/perf/Documentation/perf-diff.txt | 17 +- > tools/perf/Documentation/perf-report.txt | 2 +- > tools/perf/Documentation/tips.txt | 2 +- > tools/perf/builtin-diff.c | 382 ++++++++++++++++++++- > tools/perf/builtin-script.c | 19 + > .../arch/arm64/hisilicon/hip08/uncore-ddrc.json | 44 +++ > .../arch/arm64/hisilicon/hip08/uncore-hha.json | 51 +++ > .../arch/arm64/hisilicon/hip08/uncore-l3c.json | 37 ++ > .../arch/x86/cascadelakex/clx-metrics.json | 4 +- > .../pmu-events/arch/x86/skylakex/skx-metrics.json | 22 +- > tools/perf/pmu-events/jevents.c | 3 + > tools/perf/ui/stdio/hist.c | 27 ++ > tools/perf/util/hist.c | 41 ++- > tools/perf/util/hist.h | 8 + > tools/perf/util/metricgroup.c | 21 +- > tools/perf/util/pmu.c | 46 ++- > tools/perf/util/sort.h | 13 + > tools/perf/util/srcline.c | 4 +- > tools/perf/util/symbol.c | 22 ++ > tools/perf/util/symbol.h | 23 ++ > tools/perf/util/symbol_conf.h | 4 +- > 22 files changed, 753 insertions(+), 44 deletions(-) > create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-ddrc.json > create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-hha.json > create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-l3c.json Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-07-02 2:25 Arnaldo Carvalho de Melo 2019-07-03 13:55 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-07-02 2:25 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Andi Kleen, Kyle Meyer, Luke Mujica, Mao Han, Numfor Mbiziwo-Tiapo, Arnaldo Carvalho de Melo Hi Ingo, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit fd7d55172d1e2e501e6da0a5c1de25f06612dc2e: perf/cgroups: Don't rotate events for cgroups unnecessarily (2019-06-24 19:30:04 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190701 for you to fetch changes up to 06c642c0e9fceafd16b1a4c80d44b1c09e282215: perf jevents: Use nonlocal include statements in pmu-events.c (2019-07-01 22:50:42 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf annotate: Mao Han: - Add support for the csky processor architecture. perf stat: Andi Kleen: - Fix metrics with --no-merge. - Don't merge events in the same PMU. - Fix group lookup for metric group. Intel PT: Adrian Hunter: - Improve CBR (Core to Bus Ratio) packets support. - Fix thread stack return from kernel for kernel only case. - Export power and ptwrite events to sqlite and postgresql. core libraries: Arnaldo Carvalho de Melo: - Find routines in tools/perf/util/ that have implementations in the kernel libraries (lib/*.c), such as strreplace(), strim(), skip_spaces() and reuse them after making a copy into tools/lib and tools/include/. This continues the effort of having tools/ code looking as much as possible like kernel source code, to help encourage people to work on both the kernel and in tools hosted in the kernel sources. That in turn will help moving stuff that uses those routines to tools/lib/perf/ where they will be made available for use in other tools. In the process ditch old cruft, remove unused variables and add missing include directives for headers providing things used in places that were building by sheer luck. Kyle Meyer: - Bump MAX_NR_CPUS and MAX_CACHES to get these tools to work on more machines. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (9): perf thread-stack: Fix thread stack return from kernel for kernel-only case perf thread-stack: Eliminate code duplicating thread_stack__pop_ks() perf intel-pt: Decoder to output CBR changes immediately perf intel-pt: Cater for CBR change in PSB+ perf intel-pt: Add CBR value to decoder state perf intel-pt: Synthesize CBR events when last seen value changes perf db-export: Export synth events perf scripts python: export-to-sqlite.py: Export Intel PT power and ptwrite events perf scripts python: export-to-postgresql.py: Export Intel PT power and ptwrite events Andi Kleen (4): perf stat: Make metric event lookup more robust perf stat: Don't merge events in the same PMU perf stat: Fix group lookup for metric group perf stat: Fix metrics with --no-merge Arnaldo Carvalho de Melo (26): perf ctype: Remove unused 'graph_line' variable perf ui stdio: No need to use 'spaces' to left align perf ctype: Remove now unused 'spaces' variable perf string: Move 'dots' and 'graph_dotted_line' out of sane_ctype.h tools x86 machine: Add missing util.h to pick up 'page_size' perf kallsyms: Adopt hex2u64 from tools/perf/util/util.h perf symbols: We need util.h in symbol-elf.c for zfree() perf tools: Remove old baggage that is util/include/linux/ctype.h perf tools: Add missing util.h to pick up 'page_size' variable tools perf: Move from sane_ctype.h obtained from git to the Linux's original perf tools: Use linux/ctype.h in more places tools lib: Adopt skip_spaces() from the kernel sources perf stat: Use recently introduced skip_spaces() perf header: Use skip_spaces() in __write_cpudesc() perf time-utils: Use skip_spaces() perf probe: Use skip_spaces() for argv handling perf strfilter: Use skip_spaces() perf metricgroup: Use strsep() perf report: Use skip_spaces() perf tools: Ditch rtrim(), use skip_spaces() to get closer to the kernel tools lib: Adopt strim() from the kernel perf tools: Remove trim() implementation, use tools/lib's strim() perf tools: Ditch rtrim(), use strim() from tools/lib tools lib: Adopt strreplace() from the kernel perf tools: Drop strxfrchar(), use strreplace() equivalent from kernel tools lib: Move argv_{split,free} from tools/perf/util/ Kyle Meyer (1): perf tools: Increase MAX_NR_CPUS and MAX_CACHES Luke Mujica (1): perf jevents: Use nonlocal include statements in pmu-events.c Mao Han (1): perf annotate: Add csky support Numfor Mbiziwo-Tiapo (1): perf tools: Fix cache.h include directive tools/include/linux/ctype.h | 75 ++++++ tools/include/linux/string.h | 11 +- tools/lib/argv_split.c | 100 ++++++++ tools/lib/ctype.c | 35 +++ tools/lib/string.c | 55 +++++ tools/lib/symbol/kallsyms.c | 14 +- tools/lib/symbol/kallsyms.h | 2 + tools/perf/MANIFEST | 2 + tools/perf/arch/arm/util/cs-etm.c | 1 + tools/perf/arch/csky/annotate/instructions.c | 48 ++++ tools/perf/arch/s390/util/header.c | 2 +- tools/perf/arch/x86/tests/intel-cqm.c | 1 + tools/perf/arch/x86/util/intel-pt.c | 1 + tools/perf/arch/x86/util/machine.c | 3 +- tools/perf/builtin-kmem.c | 3 +- tools/perf/builtin-report.c | 5 +- tools/perf/builtin-sched.c | 3 +- tools/perf/builtin-script.c | 14 +- tools/perf/builtin-stat.c | 2 +- tools/perf/builtin-top.c | 3 +- tools/perf/builtin-trace.c | 2 +- tools/perf/check-headers.sh | 2 + tools/perf/perf.c | 1 + tools/perf/perf.h | 2 +- tools/perf/pmu-events/jevents.c | 4 +- tools/perf/scripts/python/export-to-postgresql.py | 251 +++++++++++++++++++++ tools/perf/scripts/python/export-to-sqlite.py | 239 ++++++++++++++++++++ tools/perf/tests/builtin-test.c | 3 +- tools/perf/tests/code-reading.c | 2 +- tools/perf/ui/browser.c | 4 +- tools/perf/ui/browsers/hists.c | 10 +- tools/perf/ui/browsers/map.c | 2 +- tools/perf/ui/gtk/hists.c | 5 +- tools/perf/ui/progress.c | 2 +- tools/perf/ui/stdio/hist.c | 16 +- tools/perf/util/Build | 9 + tools/perf/util/annotate.c | 20 +- tools/perf/util/auxtrace.c | 2 +- tools/perf/util/build-id.c | 2 +- tools/perf/util/config.c | 2 +- tools/perf/util/cpumap.c | 2 +- tools/perf/util/ctype.c | 49 ---- tools/perf/util/data-convert-bt.c | 2 +- tools/perf/util/debug.c | 2 +- tools/perf/util/demangle-java.c | 2 +- tools/perf/util/dso.c | 3 +- tools/perf/util/env.c | 2 +- tools/perf/util/event.c | 6 +- tools/perf/util/evsel.c | 3 +- tools/perf/util/header.c | 15 +- tools/perf/util/include/linux/ctype.h | 1 - .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 24 +- .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 1 + tools/perf/util/intel-pt.c | 65 ++++-- tools/perf/util/jitdump.c | 2 +- tools/perf/util/machine.c | 3 +- tools/perf/util/metricgroup.c | 52 +++-- tools/perf/util/pmu.c | 5 +- tools/perf/util/print_binary.c | 2 +- tools/perf/util/probe-event.c | 2 +- tools/perf/util/probe-finder.h | 2 +- tools/perf/util/python-ext-sources | 3 +- tools/perf/util/python.c | 1 + tools/perf/util/sane_ctype.h | 52 ----- .../util/scripting-engines/trace-event-python.c | 46 +++- tools/perf/util/srcline.c | 3 +- tools/perf/util/stat-display.c | 14 +- tools/perf/util/stat-shadow.c | 23 +- tools/perf/util/strfilter.c | 6 +- tools/perf/util/string.c | 169 +------------- tools/perf/util/string2.h | 15 +- tools/perf/util/symbol-elf.c | 3 +- tools/perf/util/symbol.c | 2 +- tools/perf/util/thread-stack.c | 48 ++-- tools/perf/util/thread_map.c | 3 +- tools/perf/util/time-utils.c | 8 +- tools/perf/util/trace-event-parse.c | 2 +- tools/perf/util/util.c | 13 -- tools/perf/util/util.h | 1 - 79 files changed, 1167 insertions(+), 450 deletions(-) create mode 100644 tools/include/linux/ctype.h create mode 100644 tools/lib/argv_split.c create mode 100644 tools/lib/ctype.c create mode 100644 tools/perf/arch/csky/annotate/instructions.c delete mode 100644 tools/perf/util/ctype.c delete mode 100644 tools/perf/util/include/linux/ctype.h delete mode 100644 tools/perf/util/sane_ctype.h Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. Investigating the failure for ubuntu:18.04-x-arm, doesn't look like something introduced by this patchkit. $ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0-rc6.tar.xz $ dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 16 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 17 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 18 debian:experimental : Ok gcc (Debian 8.3.0-7) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 19 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 20 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 21 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 22 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 23 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 24 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 25 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 26 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 27 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 28 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 29 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 30 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 31 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 32 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 33 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 34 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 35 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 36 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 37 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 38 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 39 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 40 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 41 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 42 manjaro:latest : Ok gcc (GCC) 8.3.0, clang version 8.0.0 (tags/RELEASE_800/final) 43 openmandriva:cooker : Ok gcc (GCC) 9.1.0 20190503 (OpenMandriva) 44 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 45 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 46 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 47 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190520 [gcc-9-branch revision 271396], clang version 8.0.0 (tags/RELEASE_800/final 356365) 48 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 49 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 50 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 51 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 52 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 53 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 54 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 55 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 60 ubuntu:18.04-x-arm : FAIL arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 arch/arm64/util/dwarf-regs.c: In function 'regs_query_register_offset': arch/arm64/util/dwarf-regs.c:26:43: error: dereferencing pointer to incomplete type 'struct user_pt_regs' (index * sizeof((struct user_pt_regs *)0)->regs[0]) ^ arch/arm64/util/dwarf-regs.c:91:11: note: in expansion of macro 'DWARFNUM2OFFSET' return DWARFNUM2OFFSET(roff->dwarfnum); ^~~~~~~~~~~~~~~ mv: cannot stat '/tmp/build/perf/arch/arm64/util/.dwarf-regs.o.tmp': No such file or directory 61 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0 62 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 63 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 64 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 65 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 66 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 67 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 68 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 69 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 70 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 71 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 72 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 73 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 74 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.10 : Ok gcc (Ubuntu 8.3.0-14ubuntu1) 8.3.0, clang version 8.0.1-+rc1-1~exp1 (tags/RELEASE_801/rc1) $ # uname -a Linux quaco 5.2.0-rc7 #2 SMP Mon Jul 1 23:05:41 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 06c642c0e9fc perf jevents: Use nonlocal include statements in pmu-events.c # perf version --build-options perf version 5.2.rc6.g06c642c0e9fc dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_no_libbpf_O: make NO_LIBBPF=1 make_no_libperl_O: make NO_LIBPERL=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_static_O: make LDFLAGS=-static make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_util_pmu_bison_o_O: make util/pmu-bison.o make_install_prefix_O: make install prefix=/tmp/krava make_no_libelf_O: make NO_LIBELF=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_no_gtk2_O: make NO_GTK2=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_doc_O: make doc make_help_O: make help make_perf_o_O: make perf.o make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_debug_O: make DEBUG=1 make_pure_O: make make_install_O: make install make_install_bin_O: make install-bin make_no_newt_O: make NO_NEWT=1 make_cscope_O: make cscope make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 make_tags_O: make tags make_no_libnuma_O: make NO_LIBNUMA=1 make_util_map_o_O: make util/map.o make_no_slang_O: make NO_SLANG=1 make_clean_all_O: make clean all make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-07-02 2:25 Arnaldo Carvalho de Melo @ 2019-07-03 13:55 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-07-03 13:55 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Andi Kleen, Kyle Meyer, Luke Mujica, Mao Han, Numfor Mbiziwo-Tiapo, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit fd7d55172d1e2e501e6da0a5c1de25f06612dc2e: > > perf/cgroups: Don't rotate events for cgroups unnecessarily (2019-06-24 19:30:04 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190701 > > for you to fetch changes up to 06c642c0e9fceafd16b1a4c80d44b1c09e282215: > > perf jevents: Use nonlocal include statements in pmu-events.c (2019-07-01 22:50:42 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf annotate: > > Mao Han: > > - Add support for the csky processor architecture. > > perf stat: > > Andi Kleen: > > - Fix metrics with --no-merge. > > - Don't merge events in the same PMU. > > - Fix group lookup for metric group. > > Intel PT: > > Adrian Hunter: > > - Improve CBR (Core to Bus Ratio) packets support. > > - Fix thread stack return from kernel for kernel only case. > > - Export power and ptwrite events to sqlite and postgresql. > > core libraries: > > Arnaldo Carvalho de Melo: > > - Find routines in tools/perf/util/ that have implementations in the kernel > libraries (lib/*.c), such as strreplace(), strim(), skip_spaces() and reuse > them after making a copy into tools/lib and tools/include/. > > This continues the effort of having tools/ code looking as much as possible > like kernel source code, to help encourage people to work on both the kernel > and in tools hosted in the kernel sources. > > That in turn will help moving stuff that uses those routines to > tools/lib/perf/ where they will be made available for use in other tools. > > In the process ditch old cruft, remove unused variables and add missing > include directives for headers providing things used in places that were > building by sheer luck. > > Kyle Meyer: > > - Bump MAX_NR_CPUS and MAX_CACHES to get these tools to work on more machines. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Adrian Hunter (9): > perf thread-stack: Fix thread stack return from kernel for kernel-only case > perf thread-stack: Eliminate code duplicating thread_stack__pop_ks() > perf intel-pt: Decoder to output CBR changes immediately > perf intel-pt: Cater for CBR change in PSB+ > perf intel-pt: Add CBR value to decoder state > perf intel-pt: Synthesize CBR events when last seen value changes > perf db-export: Export synth events > perf scripts python: export-to-sqlite.py: Export Intel PT power and ptwrite events > perf scripts python: export-to-postgresql.py: Export Intel PT power and ptwrite events > > Andi Kleen (4): > perf stat: Make metric event lookup more robust > perf stat: Don't merge events in the same PMU > perf stat: Fix group lookup for metric group > perf stat: Fix metrics with --no-merge > > Arnaldo Carvalho de Melo (26): > perf ctype: Remove unused 'graph_line' variable > perf ui stdio: No need to use 'spaces' to left align > perf ctype: Remove now unused 'spaces' variable > perf string: Move 'dots' and 'graph_dotted_line' out of sane_ctype.h > tools x86 machine: Add missing util.h to pick up 'page_size' > perf kallsyms: Adopt hex2u64 from tools/perf/util/util.h > perf symbols: We need util.h in symbol-elf.c for zfree() > perf tools: Remove old baggage that is util/include/linux/ctype.h > perf tools: Add missing util.h to pick up 'page_size' variable > tools perf: Move from sane_ctype.h obtained from git to the Linux's original > perf tools: Use linux/ctype.h in more places > tools lib: Adopt skip_spaces() from the kernel sources > perf stat: Use recently introduced skip_spaces() > perf header: Use skip_spaces() in __write_cpudesc() > perf time-utils: Use skip_spaces() > perf probe: Use skip_spaces() for argv handling > perf strfilter: Use skip_spaces() > perf metricgroup: Use strsep() > perf report: Use skip_spaces() > perf tools: Ditch rtrim(), use skip_spaces() to get closer to the kernel > tools lib: Adopt strim() from the kernel > perf tools: Remove trim() implementation, use tools/lib's strim() > perf tools: Ditch rtrim(), use strim() from tools/lib > tools lib: Adopt strreplace() from the kernel > perf tools: Drop strxfrchar(), use strreplace() equivalent from kernel > tools lib: Move argv_{split,free} from tools/perf/util/ > > Kyle Meyer (1): > perf tools: Increase MAX_NR_CPUS and MAX_CACHES > > Luke Mujica (1): > perf jevents: Use nonlocal include statements in pmu-events.c > > Mao Han (1): > perf annotate: Add csky support > > Numfor Mbiziwo-Tiapo (1): > perf tools: Fix cache.h include directive > > tools/include/linux/ctype.h | 75 ++++++ > tools/include/linux/string.h | 11 +- > tools/lib/argv_split.c | 100 ++++++++ > tools/lib/ctype.c | 35 +++ > tools/lib/string.c | 55 +++++ > tools/lib/symbol/kallsyms.c | 14 +- > tools/lib/symbol/kallsyms.h | 2 + > tools/perf/MANIFEST | 2 + > tools/perf/arch/arm/util/cs-etm.c | 1 + > tools/perf/arch/csky/annotate/instructions.c | 48 ++++ > tools/perf/arch/s390/util/header.c | 2 +- > tools/perf/arch/x86/tests/intel-cqm.c | 1 + > tools/perf/arch/x86/util/intel-pt.c | 1 + > tools/perf/arch/x86/util/machine.c | 3 +- > tools/perf/builtin-kmem.c | 3 +- > tools/perf/builtin-report.c | 5 +- > tools/perf/builtin-sched.c | 3 +- > tools/perf/builtin-script.c | 14 +- > tools/perf/builtin-stat.c | 2 +- > tools/perf/builtin-top.c | 3 +- > tools/perf/builtin-trace.c | 2 +- > tools/perf/check-headers.sh | 2 + > tools/perf/perf.c | 1 + > tools/perf/perf.h | 2 +- > tools/perf/pmu-events/jevents.c | 4 +- > tools/perf/scripts/python/export-to-postgresql.py | 251 +++++++++++++++++++++ > tools/perf/scripts/python/export-to-sqlite.py | 239 ++++++++++++++++++++ > tools/perf/tests/builtin-test.c | 3 +- > tools/perf/tests/code-reading.c | 2 +- > tools/perf/ui/browser.c | 4 +- > tools/perf/ui/browsers/hists.c | 10 +- > tools/perf/ui/browsers/map.c | 2 +- > tools/perf/ui/gtk/hists.c | 5 +- > tools/perf/ui/progress.c | 2 +- > tools/perf/ui/stdio/hist.c | 16 +- > tools/perf/util/Build | 9 + > tools/perf/util/annotate.c | 20 +- > tools/perf/util/auxtrace.c | 2 +- > tools/perf/util/build-id.c | 2 +- > tools/perf/util/config.c | 2 +- > tools/perf/util/cpumap.c | 2 +- > tools/perf/util/ctype.c | 49 ---- > tools/perf/util/data-convert-bt.c | 2 +- > tools/perf/util/debug.c | 2 +- > tools/perf/util/demangle-java.c | 2 +- > tools/perf/util/dso.c | 3 +- > tools/perf/util/env.c | 2 +- > tools/perf/util/event.c | 6 +- > tools/perf/util/evsel.c | 3 +- > tools/perf/util/header.c | 15 +- > tools/perf/util/include/linux/ctype.h | 1 - > .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 24 +- > .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 1 + > tools/perf/util/intel-pt.c | 65 ++++-- > tools/perf/util/jitdump.c | 2 +- > tools/perf/util/machine.c | 3 +- > tools/perf/util/metricgroup.c | 52 +++-- > tools/perf/util/pmu.c | 5 +- > tools/perf/util/print_binary.c | 2 +- > tools/perf/util/probe-event.c | 2 +- > tools/perf/util/probe-finder.h | 2 +- > tools/perf/util/python-ext-sources | 3 +- > tools/perf/util/python.c | 1 + > tools/perf/util/sane_ctype.h | 52 ----- > .../util/scripting-engines/trace-event-python.c | 46 +++- > tools/perf/util/srcline.c | 3 +- > tools/perf/util/stat-display.c | 14 +- > tools/perf/util/stat-shadow.c | 23 +- > tools/perf/util/strfilter.c | 6 +- > tools/perf/util/string.c | 169 +------------- > tools/perf/util/string2.h | 15 +- > tools/perf/util/symbol-elf.c | 3 +- > tools/perf/util/symbol.c | 2 +- > tools/perf/util/thread-stack.c | 48 ++-- > tools/perf/util/thread_map.c | 3 +- > tools/perf/util/time-utils.c | 8 +- > tools/perf/util/trace-event-parse.c | 2 +- > tools/perf/util/util.c | 13 -- > tools/perf/util/util.h | 1 - > 79 files changed, 1167 insertions(+), 450 deletions(-) > create mode 100644 tools/include/linux/ctype.h > create mode 100644 tools/lib/argv_split.c > create mode 100644 tools/lib/ctype.c > create mode 100644 tools/perf/arch/csky/annotate/instructions.c > delete mode 100644 tools/perf/util/ctype.c > delete mode 100644 tools/perf/util/include/linux/ctype.h > delete mode 100644 tools/perf/util/sane_ctype.h Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-06-21 17:38 Arnaldo Carvalho de Melo 2019-06-22 6:28 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-06-21 17:38 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Florian Fainelli, John Garry, Laura Abbott, Leo Yan, Mathieu Poirier, Raphael Gault, Suzuki K Poulose, Arnaldo Carvalho de Melo Hi Ingo, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 3ce5aceb5dee298b082adfa2baa0df5a447c1b0b: Merge tag 'perf-core-for-mingo-5.3-20190611' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-06-17 20:48:14 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190621 for you to fetch changes up to 3469fa84c1631face938efc42b3f488a2c2504e0: tools build: Fix the zstd test in the test-all.c common case feature test (2019-06-18 18:44:24 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf trace: Arnaldo Carvalho de Melo: - Fix exclusion of not available syscall names from selector list. - Fixup pointer arithmetic when consuming augmented syscall args. Intel PT: Adrian Hunter: - Add support for decoding PEBS via PT packets. See: https://software.intel.com/en-us/articles/intel-sdm May 2019 version: Vol. 3B 18.5.5.2 PEBS output to Intel® Processor Trace for more details about it. ARM64: John Garry: - Fix uncore PMU alias list for ARM64 Raphael Gault: - Compile tests unconditionally. cs-etm: Mathieu Poirier: - Optimize option setup for CPU-wide sessions. build: Florian Fainelli: - Don't hardcode host include path for libslang, fixing up building with it in cross build environments. Arnaldo Carvalho de Melo: - Check if gettid() is available before providing helper, fixing the build when using the latest glibc version, where a helper for gettid() is finally present. - Fix building with libslang in systems where it is located in slang/slang.h. - Fix fast path test for zstd library. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (11): perf intel-pt: Add new packets for PEBS via PT perf intel-pt: Add Intel PT packet decoder test perf intel-pt: Add decoder support for PEBS via PT perf intel-pt: Prepare to synthesize PEBS samples perf intel-pt: Factor out common sample preparation for re-use perf intel-pt: Synthesize PEBS sample basic information perf intel-pt: Add gp registers to synthesized PEBS sample perf intel-pt: Add XMM registers to synthesized PEBS sample perf intel-pt: Add LBR information to synthesized PEBS sample perf intel-pt: Add memory information to synthesized PEBS sample perf intel-pt: Add callchain to synthesized PEBS sample Arnaldo Carvalho de Melo (10): tools build: Check if gettid() is available before providing helper perf trace: Fix exclusion of not available syscall names from selector list perf trace: Streamline validation of select syscall names list tools build feature tests: Add missing SPDX headers perf tests: Add missing SPDX headers perf trace: Fixup pointer arithmetic when consuming augmented syscall args perf evsel: Make perf_evsel__name() accept a NULL argument tools build: Add test to check if slang.h is in /usr/include/slang/ perf build: Handle slang being in /usr/include and in /usr/include/slang/ tools build: Fix the zstd test in the test-all.c common case feature test Florian Fainelli (1): perf tools: Don't hardcode host include path for libslang John Garry (1): perf pmu: Fix uncore PMU alias list for ARM64 Mathieu Poirier (1): perf: cs-etm: Optimize option setup for CPU-wide sessions Raphael Gault (1): perf tests arm64: Compile tests unconditionally tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 10 +- tools/build/feature/test-all.c | 7 +- tools/build/feature/test-fortify-source.c | 1 + tools/build/feature/test-gettid.c | 11 + tools/build/feature/test-hello.c | 1 + tools/build/feature/test-libslang-include-subdir.c | 7 + tools/build/feature/test-setns.c | 1 + tools/perf/Makefile.config | 16 +- tools/perf/arch/arm/util/cs-etm.c | 20 +- tools/perf/arch/arm64/Build | 2 +- tools/perf/arch/arm64/tests/Build | 2 +- tools/perf/arch/x86/include/arch-tests.h | 1 + tools/perf/arch/x86/tests/Build | 2 +- tools/perf/arch/x86/tests/arch-tests.c | 4 + .../arch/x86/tests/intel-pt-pkt-decoder-test.c | 304 +++++++++++++++++++++ tools/perf/builtin-trace.c | 20 +- tools/perf/jvmti/jvmti_agent.c | 2 + tools/perf/tests/Build | 2 + tools/perf/tests/bp_account.c | 1 + tools/perf/tests/bpf-script-example.c | 1 + tools/perf/tests/bpf-script-test-kbuild.c | 1 + tools/perf/tests/bpf-script-test-prologue.c | 1 + tools/perf/tests/bpf-script-test-relocation.c | 1 + tools/perf/tests/bpf.c | 1 + tools/perf/tests/map_groups.c | 1 + tools/perf/tests/mem.c | 1 + tools/perf/tests/mem2node.c | 1 + tools/perf/tests/shell/lib/probe.sh | 1 + tools/perf/tests/shell/probe_vfs_getname.sh | 3 +- .../tests/shell/record+probe_libc_inet_pton.sh | 1 + .../tests/shell/record+script_probe_vfs_getname.sh | 1 + tools/perf/tests/shell/record+zstd_comp_decomp.sh | 2 + tools/perf/tests/shell/trace+probe_vfs_getname.sh | 1 + tools/perf/ui/libslang.h | 5 + tools/perf/util/evsel.c | 8 +- .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 114 +++++++- .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 137 ++++++++++ .../util/intel-pt-decoder/intel-pt-pkt-decoder.c | 140 +++++++++- .../util/intel-pt-decoder/intel-pt-pkt-decoder.h | 21 +- tools/perf/util/intel-pt.c | 296 +++++++++++++++++++- tools/perf/util/pmu.c | 28 +- 42 files changed, 1115 insertions(+), 68 deletions(-) create mode 100644 tools/build/feature/test-gettid.c create mode 100644 tools/build/feature/test-libslang-include-subdir.c create mode 100644 tools/perf/arch/x86/tests/intel-pt-pkt-decoder-test.c Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. $ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0-rc4.tar.xz $ dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1) 9 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 10 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 11 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 13 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 14 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 15 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) 16 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.1.1 20190611 gcc-9-branch@272162 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 19 debian:experimental : Ok gcc (Debian 8.3.0-7) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 20 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 21 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 22 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 23 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 24 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 25 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) 26 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 27 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 28 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 29 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 30 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 31 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 32 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 33 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 34 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 35 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 36 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 37 fedora:31 : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 38 fedora:rawhide : Ok gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc31) 39 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 40 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 41 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 42 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 43 manjaro:latest : Ok gcc (GCC) 8.3.0, clang version 8.0.0 (tags/RELEASE_800/final) 44 openmandriva:cooker : Ok gcc (GCC) 9.1.0 20190503 (OpenMandriva) 45 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548) 46 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 47 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 48 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190520 [gcc-9-branch revision 271396], clang version 7.0.1 (tags/RELEASE_701/final 349238) 49 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 50 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 51 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 52 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4) 53 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 54 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 55 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 59 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 60 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 61 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04) 7.4.0 62 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04) 7.4.0 63 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 64 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 65 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 66 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 67 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 68 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 69 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 70 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 71 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 72 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 73 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 74 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 75 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 $ # uname -a Linux quaco 5.2.0-rc4+ #1 SMP Tue Jun 11 11:21:27 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 3469fa84c163 tools build: Fix the zstd test in the test-all.c common case feature test # perf version --build-options perf version 5.2.rc4.gd1d5628fa057 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: Intel PT packet decoder : Ok 66: x86 bp modify : Ok 67: probe libc's inet_pton & backtrace it with ping : Ok 68: Use vfs_getname probe to get syscall args filenames : Ok 69: Add vfs_getname probe to get syscall args filenames : Ok 70: Check open filename arg using perf trace + vfs_getname: Ok 71: Zstd perf.data compression/decompression : Ok # $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_install_prefix_O: make install prefix=/tmp/krava make_install_prefix_slash_O: make install prefix=/tmp/krava/ - /home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC: make FEATURE_DUMP_COPY=/home/acme/git/perf/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump make_static_O: make LDFLAGS=-static make_with_clangllvm_O: make LIBCLANGLLVM=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_help_O: make help make_no_backtrace_O: make NO_BACKTRACE=1 make_install_bin_O: make install-bin make_no_libpython_O: make NO_LIBPYTHON=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_clean_all_O: make clean all make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 make_debug_O: make DEBUG=1 make_no_libperl_O: make NO_LIBPERL=1 make_no_gtk2_O: make NO_GTK2=1 make_no_slang_O: make NO_SLANG=1 make_no_libbpf_O: make NO_LIBBPF=1 make_no_libelf_O: make NO_LIBELF=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_util_map_o_O: make util/map.o make_util_pmu_bison_o_O: make util/pmu-bison.o make_cscope_O: make cscope make_no_libnuma_O: make NO_LIBNUMA=1 make_perf_o_O: make perf.o make_no_newt_O: make NO_NEWT=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_install_O: make install make_tags_O: make tags make_doc_O: make doc make_no_demangle_O: make NO_DEMANGLE=1 make_pure_O: make make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-06-21 17:38 Arnaldo Carvalho de Melo @ 2019-06-22 6:28 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-06-22 6:28 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Florian Fainelli, John Garry, Laura Abbott, Leo Yan, Mathieu Poirier, Raphael Gault, Suzuki K Poulose, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 3ce5aceb5dee298b082adfa2baa0df5a447c1b0b: > > Merge tag 'perf-core-for-mingo-5.3-20190611' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2019-06-17 20:48:14 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190621 > > for you to fetch changes up to 3469fa84c1631face938efc42b3f488a2c2504e0: > > tools build: Fix the zstd test in the test-all.c common case feature test (2019-06-18 18:44:24 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf trace: > > Arnaldo Carvalho de Melo: > > - Fix exclusion of not available syscall names from selector list. > > - Fixup pointer arithmetic when consuming augmented syscall args. > > Intel PT: > > Adrian Hunter: > > - Add support for decoding PEBS via PT packets. See: > > https://software.intel.com/en-us/articles/intel-sdm > May 2019 version: Vol. 3B 18.5.5.2 PEBS output to Intel® Processor Trace > > for more details about it. > > ARM64: > > John Garry: > > - Fix uncore PMU alias list for ARM64 > > Raphael Gault: > > - Compile tests unconditionally. > > cs-etm: > > Mathieu Poirier: > > - Optimize option setup for CPU-wide sessions. > > build: > > Florian Fainelli: > > - Don't hardcode host include path for libslang, fixing up building with it > in cross build environments. > > Arnaldo Carvalho de Melo: > > - Check if gettid() is available before providing helper, fixing the build > when using the latest glibc version, where a helper for gettid() is finally > present. > > - Fix building with libslang in systems where it is located in slang/slang.h. > > - Fix fast path test for zstd library. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Adrian Hunter (11): > perf intel-pt: Add new packets for PEBS via PT > perf intel-pt: Add Intel PT packet decoder test > perf intel-pt: Add decoder support for PEBS via PT > perf intel-pt: Prepare to synthesize PEBS samples > perf intel-pt: Factor out common sample preparation for re-use > perf intel-pt: Synthesize PEBS sample basic information > perf intel-pt: Add gp registers to synthesized PEBS sample > perf intel-pt: Add XMM registers to synthesized PEBS sample > perf intel-pt: Add LBR information to synthesized PEBS sample > perf intel-pt: Add memory information to synthesized PEBS sample > perf intel-pt: Add callchain to synthesized PEBS sample > > Arnaldo Carvalho de Melo (10): > tools build: Check if gettid() is available before providing helper > perf trace: Fix exclusion of not available syscall names from selector list > perf trace: Streamline validation of select syscall names list > tools build feature tests: Add missing SPDX headers > perf tests: Add missing SPDX headers > perf trace: Fixup pointer arithmetic when consuming augmented syscall args > perf evsel: Make perf_evsel__name() accept a NULL argument > tools build: Add test to check if slang.h is in /usr/include/slang/ > perf build: Handle slang being in /usr/include and in /usr/include/slang/ > tools build: Fix the zstd test in the test-all.c common case feature test > > Florian Fainelli (1): > perf tools: Don't hardcode host include path for libslang > > John Garry (1): > perf pmu: Fix uncore PMU alias list for ARM64 > > Mathieu Poirier (1): > perf: cs-etm: Optimize option setup for CPU-wide sessions > > Raphael Gault (1): > perf tests arm64: Compile tests unconditionally > > tools/build/Makefile.feature | 3 +- > tools/build/feature/Makefile | 10 +- > tools/build/feature/test-all.c | 7 +- > tools/build/feature/test-fortify-source.c | 1 + > tools/build/feature/test-gettid.c | 11 + > tools/build/feature/test-hello.c | 1 + > tools/build/feature/test-libslang-include-subdir.c | 7 + > tools/build/feature/test-setns.c | 1 + > tools/perf/Makefile.config | 16 +- > tools/perf/arch/arm/util/cs-etm.c | 20 +- > tools/perf/arch/arm64/Build | 2 +- > tools/perf/arch/arm64/tests/Build | 2 +- > tools/perf/arch/x86/include/arch-tests.h | 1 + > tools/perf/arch/x86/tests/Build | 2 +- > tools/perf/arch/x86/tests/arch-tests.c | 4 + > .../arch/x86/tests/intel-pt-pkt-decoder-test.c | 304 +++++++++++++++++++++ > tools/perf/builtin-trace.c | 20 +- > tools/perf/jvmti/jvmti_agent.c | 2 + > tools/perf/tests/Build | 2 + > tools/perf/tests/bp_account.c | 1 + > tools/perf/tests/bpf-script-example.c | 1 + > tools/perf/tests/bpf-script-test-kbuild.c | 1 + > tools/perf/tests/bpf-script-test-prologue.c | 1 + > tools/perf/tests/bpf-script-test-relocation.c | 1 + > tools/perf/tests/bpf.c | 1 + > tools/perf/tests/map_groups.c | 1 + > tools/perf/tests/mem.c | 1 + > tools/perf/tests/mem2node.c | 1 + > tools/perf/tests/shell/lib/probe.sh | 1 + > tools/perf/tests/shell/probe_vfs_getname.sh | 3 +- > .../tests/shell/record+probe_libc_inet_pton.sh | 1 + > .../tests/shell/record+script_probe_vfs_getname.sh | 1 + > tools/perf/tests/shell/record+zstd_comp_decomp.sh | 2 + > tools/perf/tests/shell/trace+probe_vfs_getname.sh | 1 + > tools/perf/ui/libslang.h | 5 + > tools/perf/util/evsel.c | 8 +- > .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 114 +++++++- > .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 137 ++++++++++ > .../util/intel-pt-decoder/intel-pt-pkt-decoder.c | 140 +++++++++- > .../util/intel-pt-decoder/intel-pt-pkt-decoder.h | 21 +- > tools/perf/util/intel-pt.c | 296 +++++++++++++++++++- > tools/perf/util/pmu.c | 28 +- > 42 files changed, 1115 insertions(+), 68 deletions(-) > create mode 100644 tools/build/feature/test-gettid.c > create mode 100644 tools/build/feature/test-libslang-include-subdir.c > create mode 100644 tools/perf/arch/x86/tests/intel-pt-pkt-decoder-test.c Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-06-11 18:57 Arnaldo Carvalho de Melo 2019-06-17 18:48 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-06-11 18:57 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexey Budankov, Kan Liang, Leo Yan, Mathieu Poirier, Song Liu, Suzuki K Poulose, Thomas Richter, yuzhoujian, Arnaldo Carvalho de Melo Hi Ingo, Please consider pulling, Best regards, Test results at the end of this message, as usual. - Arnaldo The following changes since commit 3384c78631dd722c2cdc5c57fbdd39fc1b5a9f2d: Merge branch 'x86/topology' into perf/core, to prepare for new patches (2019-06-03 11:58:45 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190611 for you to fetch changes up to 04c41bcb862bbec1fb225243ecf07a3219593f81: perf trace: Skip unknown syscalls when expanding strace like syscall groups (2019-06-10 17:50:04 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf record: Alexey Budankov: - Allow mixing --user-regs with --call-graph=dwarf, making sure that the minimal set of registers for DWARF unwinding is present in the set of user registers requested to be present in each sample, while warning the user that this may make callchains unreliable if more that the minimal set of registers is needed to unwind. yuzhoujian: - Add support to collect callchains from kernel or user space only, IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user} bits from the command line. perf trace: Arnaldo Carvalho de Melo: - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit} payloads, use instead the syscall numbers obtainer either by the arch specific syscalltbl generators or from audit-libs. - Allow 'perf trace' to ask for the number of bytes to collect for string arguments, for now ask for PATH_MAX, i.e. the whole pathnames, which ends up being just a way to speficy which syscall args are pathnames and thus should be read using bpf_probe_read_str(). - Skip unknown syscalls when expanding strace like syscall groups. This helps using the 'string' group of syscalls to work in arm64, where some of the syscalls present in x86_64 that deal with strings, for instance 'access', are deprecated and this should not be asked for tracing. Leo Yan: - Exit when failing to build eBPF program. perf config: Arnaldo Carvalho de Melo: - Bail out when a handler returns failure for a key-value pair. This helps with cases where processing a key-value pair is not just a matter of setting some tool specific knob, involving, for instance building a BPF program to then attach to the list of events 'perf trace' will use, e.g. augmented_raw_syscalls.c. perf.data: Kan Liang: - Read and store die ID information available in new Intel processors in CPUID.1F in the CPU topology written in the perf.data header. perf stat: Kan Liang: - Support per-die aggregation. Documentation: Arnaldo Carvalho de Melo: - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY, CLOCKID and DIR_FORMAT headers. Song Liu: - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF. Leo Yan: - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'. JVMTI: Jiri Olsa: - Address gcc string overflow warning for strncpy() core: - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd(). Intel PT: Adrian Hunter: - Add support for samples to contain IPC ratio, collecting cycles information from CYC packets, showing the IPC info periodically, because Intel PT does not update the cycle count on every branch or instruction, the incremental values will often be zero. When there are values, they will be the number of instructions and number of cycles since the last update, and thus represent the average IPC since the last IPC value. E.g.: # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001 rounding mmap pages size to 1024M (262144 pages) [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 2.208 MB perf.data ] # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid # <SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)> 1 cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f jnz 0x7f5219ac2af0 IPC: 0.81 (36/44) 2 cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45 cmp $0x1f, %rbp 3 cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49 jbe 0x7f5219ac2b00 4 cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f test $0x8, %al 5 cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51 jnz 0x7f5219ac2b00 6 cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57 movq 0x13c58a(%rip), %rcx 7 cc1 63501.650479626: 7f5219ac27de _int_free+0x5e mov %rdi, %r12 8 cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61 movq %fs:(%rcx), %rax 9 cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65 test %rax, %rax 10 cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68 jz 0x7f5219ac2821 11 cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a leaq -0x11(%rbp), %rdi 12 cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e mov %rdi, %rsi 13 cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71 shr $0x4, %rsi 14 cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75 cmpq %rsi, 0x13caf4(%rip) 15 cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c jbe 0x7f5219ac2821 16 cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1 cmpq 0x13f138(%rip), %rbp 17 cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8 jnbe 0x7f5219ac28d8 18 cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158 testb $0x2, 0x8(%rbx) 19 cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c jnz 0x7f5219ac2ab0 IPC: 6.00 (18/3) <SNIP> - Allow using time ranges with Intel PT, i.e. these features, already present but not optimially usable with Intel PT, should be now: Select the second 10% time slice: $ perf script --time 10%/2 Select from 0% to 10% time slice: $ perf script --time 0%-10% Select the first and second 10% time slices: $ perf script --time 10%/1,10%/2 Select from 0% to 10% and 30% to 40% slices: $ perf script --time 0%-10%,30%-40% cs-etm (ARM): Mathieu Poirier: - Add support for CPU-wide trace scenarios. s390: Thomas Richter: - Fix missing kvm module load for s390. - Fix OOM error in TUI mode on s390 - Support s390 diag event display when doing analysis on !s390 architectures. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (38): perf intel-pt: Factor out intel_pt_update_sample_time perf intel-pt: Accumulate cycle count from CYC packets perf tools: Add IPC information to perf_sample perf intel-pt: Add support for samples to contain IPC ratio perf script: Add output of IPC ratio perf intel-pt: Record when decoding PSB+ packets perf intel-pt: Re-factor TIP cases in intel_pt_walk_to_ip perf intel-pt: Accumulate cycle count from TSC/TMA/MTC packets perf intel-pt: Document IPC usage perf thread-stack: Accumulate IPC information perf db-export: Add brief documentation perf db-export: Export IPC information perf scripts python: export-to-sqlite.py: Export IPC information perf scripts python: export-to-postgresql.py: Export IPC information perf scripts python: exported-sql-viewer.py: Add IPC information to the Branch reports perf scripts python: exported-sql-viewer.py: Add CallGraphModelParams perf scripts python: exported-sql-viewer.py: Add IPC information to Call Graph Graph perf scripts python: exported-sql-viewer.py: Add IPC information to Call Tree perf scripts python: exported-sql-viewer.py: Select find text when find bar is activated perf auxtrace: Add perf time interval to itrace_synth_ops perf script: Set perf time interval in itrace_synth_ops perf report: Set perf time interval in itrace_synth_ops perf intel-pt: Add lookahead callback perf intel-pt: Factor out intel_pt_8b_tsc() perf intel-pt: Factor out intel_pt_reposition() perf intel-pt: Add reposition parameter to intel_pt_get_data() perf intel-pt: Add intel_pt_fast_forward() perf intel-pt: Factor out intel_pt_get_buffer() perf intel-pt: Add support for lookahead perf intel-pt: Add support for efficient time interval filtering perf time-utils: Treat time ranges consistently perf time-utils: Factor out set_percent_time() perf time-utils: Prevent percentage time range overlap perf time-utils: Fix --time documentation perf time-utils: Simplify perf_time__parse_for_ranges() error paths slightly perf time-utils: Make perf_time__parse_for_ranges() more logical perf tests: Add a test for time-utils perf time-utils: Add support for multiple explicit time intervals Alexey Budankov (1): perf record: Allow mixing --user-regs with --call-graph=dwarf Arnaldo Carvalho de Melo (13): perf data: Document memory topology header: HEADER_MEM_TOPOLOGY perf data: Document clockid header: HEADER_CLOCKID perf data: Document directory format header: HEADER_DIR_FORMAT perf augmented_raw_syscalls: Tell which args are filenames and how many bytes to copy perf augmented_raw_syscalls: Move the probe_read_str to a separate function perf augmented_raw_syscalls: Change helper to consider just the augmented_filename part perf augmented_raw_syscalls: Move reading filename to the loop perf trace: Consume the augmented_raw_syscalls payload perf trace: Associate more argument names with the filename beautifier perf config: Bail out when a handler returns failure for a key-value pair perf data: Fix perf.data documentation for HEADER_CPU_TOPOLOGY perf cs-etm: Remove duplicate GENMASK() define, use linux/bits.h instead perf trace: Skip unknown syscalls when expanding strace like syscall groups Jiri Olsa (2): perf jvmti: Address gcc string overflow warning for strncpy() perf evsel: Remove superfluous nthreads system_wide setup in alloc_fd() Kan Liang (5): perf cpumap: Retrieve die id information perf header: Add die information in CPU topology perf stat: Support per-die aggregation perf header: Rename "sibling cores" to "sibling sockets" perf tools: Apply new CPU topology sysfs attributes Leo Yan (3): perf symbols: Remove unused variable 'err' perf trace: Exit when failing to build eBPF program perf config: Update default value for llvm.clang-bpf-cmd-template Mathieu Poirier (18): perf cs-etm: Configure contextID tracing in CPU-wide mode perf cs-etm: Configure timestamp generation in CPU-wide mode perf cs-etm: Configure SWITCH_EVENTS in CPU-wide mode perf cs-etm: Add handling of itrace start events perf cs-etm: Add handling of switch-CPU-wide events perf cs-etm: Refactor error path in cs_etm_decoder__new() perf cs-etm: Move packet queue out of decoder structure perf cs-etm: Fix indentation in function cs_etm__process_decoder_queue() perf cs-etm: Introduce the concept of trace ID queues perf cs-etm: Get rid of unused cpu in struct cs_etm_queue perf cs-etm: Move thread to traceid_queue perf cs-etm: Move tid/pid to traceid_queue perf cs-etm: Use traceID aware memory callback API perf cs-etm: Add support for multiple traceID queues perf cs-etm: Linking PE contextID with perf thread mechanic perf cs-etm: Add notion of time to decoding code perf cs-etm: Add support for CPU-wide trace scenarios perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode Song Liu (1): perf data: Add description of header HEADER_BPF_PROG_INFO and HEADER_BPF_BTF Thomas Richter (3): perf test 6: Fix missing kvm module load for s390 perf report: Fix OOM error in TUI mode on s390 perf report: Support s390 diag event display on x86 yuzhoujian (1): perf record: Add support to collect callchains from kernel or user space only tools/perf/Documentation/db-export.txt | 41 + tools/perf/Documentation/intel-pt.txt | 30 + tools/perf/Documentation/perf-config.txt | 9 +- tools/perf/Documentation/perf-diff.txt | 14 +- tools/perf/Documentation/perf-record.txt | 11 + tools/perf/Documentation/perf-report.txt | 9 +- tools/perf/Documentation/perf-script.txt | 14 +- tools/perf/Documentation/perf-stat.txt | 10 + tools/perf/Documentation/perf.data-file-format.txt | 97 +- tools/perf/Makefile.config | 3 + tools/perf/arch/arm/util/cs-etm.c | 313 +++++- tools/perf/builtin-record.c | 4 + tools/perf/builtin-report.c | 8 +- tools/perf/builtin-script.c | 31 +- tools/perf/builtin-stat.c | 87 +- tools/perf/builtin-trace.c | 84 +- tools/perf/examples/bpf/augmented_raw_syscalls.c | 281 ++---- tools/perf/jvmti/libjvmti.c | 4 +- tools/perf/perf.h | 2 + tools/perf/scripts/python/export-to-postgresql.py | 36 +- tools/perf/scripts/python/export-to-sqlite.py | 36 +- tools/perf/scripts/python/exported-sql-viewer.py | 294 ++++-- tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 4 + tools/perf/tests/parse-events.c | 27 + tools/perf/tests/tests.h | 1 + tools/perf/tests/time-utils-test.c | 251 +++++ tools/perf/util/annotate.c | 5 +- tools/perf/util/auxtrace.h | 34 + tools/perf/util/config.c | 8 +- tools/perf/util/cpumap.c | 64 +- tools/perf/util/cpumap.h | 10 +- tools/perf/util/cputopo.c | 84 +- tools/perf/util/cputopo.h | 2 + tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 268 +++-- tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 39 +- tools/perf/util/cs-etm.c | 1026 +++++++++++++++----- tools/perf/util/cs-etm.h | 94 ++ tools/perf/util/env.c | 1 + tools/perf/util/env.h | 3 + tools/perf/util/event.h | 2 + tools/perf/util/evsel.c | 16 +- tools/perf/util/header.c | 96 +- .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 329 ++++++- .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 6 + tools/perf/util/intel-pt.c | 354 ++++++- tools/perf/util/perf_regs.h | 4 + tools/perf/util/s390-cpumsf.c | 96 +- .../util/scripting-engines/trace-event-python.c | 8 +- tools/perf/util/smt.c | 8 +- tools/perf/util/stat-display.c | 29 +- tools/perf/util/stat-shadow.c | 1 + tools/perf/util/stat.c | 1 + tools/perf/util/stat.h | 1 + tools/perf/util/symbol-elf.c | 3 +- tools/perf/util/thread-stack.c | 14 + tools/perf/util/thread-stack.h | 4 + tools/perf/util/time-utils.c | 132 ++- 58 files changed, 3581 insertions(+), 863 deletions(-) create mode 100644 tools/perf/Documentation/db-export.txt create mode 100644 tools/perf/tests/time-utils-test.c Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. $ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0-rc3.tar.xz $ dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1) 8 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 9 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 10 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 11 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 13 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 14 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) 15 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.0.1 20190501 (prerelease) gcc-8-branch@270761, clang version 8.0.0 (tags/RELEASE_800/final) 16 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 17 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 18 debian:experimental : Ok gcc (Debian 8.3.0-7) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final) 19 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 20 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 21 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 22 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 23 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 24 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) 25 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) 26 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1) 27 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 28 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 29 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 30 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 31 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 32 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 33 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), clang version 8.0.0 (Fedora 8.0.0-1.fc30) 34 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 35 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 36 fedora:rawhide : Ok gcc (GCC) 9.0.1 20190418 (Red Hat 9.0.1-0.14), clang version 8.0.0 (Fedora 8.0.0-2.fc31) 37 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 38 mageia:5 : Ok gcc (GCC) 4.9.2 39 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0 40 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.0 41 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0 42 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553) 43 opensuse:tumbleweed : Ok gcc (SUSE Linux) 9.1.1 20190520 [gcc-9-branch revision 271396], clang version 7.0.1 (tags/RELEASE_701/final 349238) 44 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 45 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1), clang version 3.4.2 (tags/RELEASE_34/dot2-final) 46 ubuntu:12.04.5 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 47 ubuntu:14.04.4 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 48 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 49 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 50 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 51 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 52 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 53 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 54 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 55 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 56 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04) 7.4.0 57 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04) 7.4.0 58 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 59 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 60 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 61 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 62 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 63 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 64 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 65 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 66 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final) 67 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final) 68 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 69 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 70 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 $ # uname -a Linux quaco 5.2.0-rc1+ #1 SMP Thu May 23 10:37:55 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 04c41bcb862b perf trace: Skip unknown syscalls when expanding strace like syscall groups # perf version --build-options perf version 5.2.rc3.g04c41bcb862b dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: time utils : Ok 60: map_groups__merge_in : Ok 61: x86 rdpmc : Ok 62: Convert perf time to TSC : Ok 63: DWARF unwind : Ok 64: x86 instruction decoder - new instructions : Ok 65: x86 bp modify : Ok 66: probe libc's inet_pton & backtrace it with ping : Ok 67: Use vfs_getname probe to get syscall args filenames : Ok 68: Add vfs_getname probe to get syscall args filenames : Ok 69: Check open filename arg using perf trace + vfs_getname: Ok 70: Zstd perf.data compression/decompression : Ok $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_tags_O: make tags make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 make_doc_O: make doc make_install_prefix_O: make install prefix=/tmp/krava make_util_pmu_bison_o_O: make util/pmu-bison.o make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_slang_O: make NO_SLANG=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_newt_O: make NO_NEWT=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_help_O: make help make_no_libunwind_O: make NO_LIBUNWIND=1 make_install_O: make install make_no_libelf_O: make NO_LIBELF=1 make_pure_O: make make_static_O: make LDFLAGS=-static make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_debug_O: make DEBUG=1 make_no_libbpf_O: make NO_LIBBPF=1 make_no_demangle_O: make NO_DEMANGLE=1 make_perf_o_O: make perf.o make_cscope_O: make cscope make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_util_map_o_O: make util/map.o make_no_libperl_O: make NO_LIBPERL=1 make_install_bin_O: make install-bin make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_gtk2_O: make NO_GTK2=1 make_clean_all_O: make clean all OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-06-11 18:57 Arnaldo Carvalho de Melo @ 2019-06-17 18:48 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-06-17 18:48 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexey Budankov, Kan Liang, Leo Yan, Mathieu Poirier, Song Liu, Suzuki K Poulose, Thomas Richter, yuzhoujian, Arnaldo Carvalho de Melo * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, > > Best regards, > > Test results at the end of this message, as usual. > > - Arnaldo > > The following changes since commit 3384c78631dd722c2cdc5c57fbdd39fc1b5a9f2d: > > Merge branch 'x86/topology' into perf/core, to prepare for new patches (2019-06-03 11:58:45 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.3-20190611 > > for you to fetch changes up to 04c41bcb862bbec1fb225243ecf07a3219593f81: > > perf trace: Skip unknown syscalls when expanding strace like syscall groups (2019-06-10 17:50:04 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf record: > > Alexey Budankov: > > - Allow mixing --user-regs with --call-graph=dwarf, making sure that > the minimal set of registers for DWARF unwinding is present in the > set of user registers requested to be present in each sample, while > warning the user that this may make callchains unreliable if more > that the minimal set of registers is needed to unwind. > > yuzhoujian: > > - Add support to collect callchains from kernel or user space only, > IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user} > bits from the command line. > > perf trace: > > Arnaldo Carvalho de Melo: > > - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls > BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit} > payloads, use instead the syscall numbers obtainer either by the > arch specific syscalltbl generators or from audit-libs. > > - Allow 'perf trace' to ask for the number of bytes to collect for > string arguments, for now ask for PATH_MAX, i.e. the whole > pathnames, which ends up being just a way to speficy which syscall > args are pathnames and thus should be read using bpf_probe_read_str(). > > - Skip unknown syscalls when expanding strace like syscall groups. > This helps using the 'string' group of syscalls to work in arm64, > where some of the syscalls present in x86_64 that deal with > strings, for instance 'access', are deprecated and this should not > be asked for tracing. > > Leo Yan: > > - Exit when failing to build eBPF program. > > perf config: > > Arnaldo Carvalho de Melo: > > - Bail out when a handler returns failure for a key-value pair. This > helps with cases where processing a key-value pair is not just a > matter of setting some tool specific knob, involving, for instance > building a BPF program to then attach to the list of events 'perf > trace' will use, e.g. augmented_raw_syscalls.c. > > perf.data: > > Kan Liang: > > - Read and store die ID information available in new Intel processors > in CPUID.1F in the CPU topology written in the perf.data header. > > perf stat: > > Kan Liang: > > - Support per-die aggregation. > > Documentation: > > Arnaldo Carvalho de Melo: > > - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY, > CLOCKID and DIR_FORMAT headers. > > Song Liu: > > - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF. > > Leo Yan: > > - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'. > > JVMTI: > > Jiri Olsa: > > - Address gcc string overflow warning for strncpy() > > core: > > - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd(). > > Intel PT: > > Adrian Hunter: > > - Add support for samples to contain IPC ratio, collecting cycles > information from CYC packets, showing the IPC info periodically, because > Intel PT does not update the cycle count on every branch or instruction, > the incremental values will often be zero. When there are values, they > will be the number of instructions and number of cycles since the last > update, and thus represent the average IPC since the last IPC value. > > E.g.: > > # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001 > rounding mmap pages size to 1024M (262144 pages) > [ perf record: Woken up 0 times to write data ] > [ perf record: Captured and wrote 2.208 MB perf.data ] > # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid > # > <SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)> > 1 cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f jnz 0x7f5219ac2af0 IPC: 0.81 (36/44) > 2 cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45 cmp $0x1f, %rbp > 3 cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49 jbe 0x7f5219ac2b00 > 4 cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f test $0x8, %al > 5 cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51 jnz 0x7f5219ac2b00 > 6 cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57 movq 0x13c58a(%rip), %rcx > 7 cc1 63501.650479626: 7f5219ac27de _int_free+0x5e mov %rdi, %r12 > 8 cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61 movq %fs:(%rcx), %rax > 9 cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65 test %rax, %rax > 10 cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68 jz 0x7f5219ac2821 > 11 cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a leaq -0x11(%rbp), %rdi > 12 cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e mov %rdi, %rsi > 13 cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71 shr $0x4, %rsi > 14 cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75 cmpq %rsi, 0x13caf4(%rip) > 15 cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c jbe 0x7f5219ac2821 > 16 cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1 cmpq 0x13f138(%rip), %rbp > 17 cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8 jnbe 0x7f5219ac28d8 > 18 cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158 testb $0x2, 0x8(%rbx) > 19 cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c jnz 0x7f5219ac2ab0 IPC: 6.00 (18/3) > <SNIP> > > - Allow using time ranges with Intel PT, i.e. these features, already > present but not optimially usable with Intel PT, should be now: > > Select the second 10% time slice: > > $ perf script --time 10%/2 > > Select from 0% to 10% time slice: > > $ perf script --time 0%-10% > > Select the first and second 10% time slices: > > $ perf script --time 10%/1,10%/2 > > Select from 0% to 10% and 30% to 40% slices: > > $ perf script --time 0%-10%,30%-40% > > cs-etm (ARM): > > Mathieu Poirier: > > - Add support for CPU-wide trace scenarios. > > s390: > > Thomas Richter: > > - Fix missing kvm module load for s390. > > - Fix OOM error in TUI mode on s390 > > - Support s390 diag event display when doing analysis on !s390 > architectures. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Adrian Hunter (38): > perf intel-pt: Factor out intel_pt_update_sample_time > perf intel-pt: Accumulate cycle count from CYC packets > perf tools: Add IPC information to perf_sample > perf intel-pt: Add support for samples to contain IPC ratio > perf script: Add output of IPC ratio > perf intel-pt: Record when decoding PSB+ packets > perf intel-pt: Re-factor TIP cases in intel_pt_walk_to_ip > perf intel-pt: Accumulate cycle count from TSC/TMA/MTC packets > perf intel-pt: Document IPC usage > perf thread-stack: Accumulate IPC information > perf db-export: Add brief documentation > perf db-export: Export IPC information > perf scripts python: export-to-sqlite.py: Export IPC information > perf scripts python: export-to-postgresql.py: Export IPC information > perf scripts python: exported-sql-viewer.py: Add IPC information to the Branch reports > perf scripts python: exported-sql-viewer.py: Add CallGraphModelParams > perf scripts python: exported-sql-viewer.py: Add IPC information to Call Graph Graph > perf scripts python: exported-sql-viewer.py: Add IPC information to Call Tree > perf scripts python: exported-sql-viewer.py: Select find text when find bar is activated > perf auxtrace: Add perf time interval to itrace_synth_ops > perf script: Set perf time interval in itrace_synth_ops > perf report: Set perf time interval in itrace_synth_ops > perf intel-pt: Add lookahead callback > perf intel-pt: Factor out intel_pt_8b_tsc() > perf intel-pt: Factor out intel_pt_reposition() > perf intel-pt: Add reposition parameter to intel_pt_get_data() > perf intel-pt: Add intel_pt_fast_forward() > perf intel-pt: Factor out intel_pt_get_buffer() > perf intel-pt: Add support for lookahead > perf intel-pt: Add support for efficient time interval filtering > perf time-utils: Treat time ranges consistently > perf time-utils: Factor out set_percent_time() > perf time-utils: Prevent percentage time range overlap > perf time-utils: Fix --time documentation > perf time-utils: Simplify perf_time__parse_for_ranges() error paths slightly > perf time-utils: Make perf_time__parse_for_ranges() more logical > perf tests: Add a test for time-utils > perf time-utils: Add support for multiple explicit time intervals > > Alexey Budankov (1): > perf record: Allow mixing --user-regs with --call-graph=dwarf > > Arnaldo Carvalho de Melo (13): > perf data: Document memory topology header: HEADER_MEM_TOPOLOGY > perf data: Document clockid header: HEADER_CLOCKID > perf data: Document directory format header: HEADER_DIR_FORMAT > perf augmented_raw_syscalls: Tell which args are filenames and how many bytes to copy > perf augmented_raw_syscalls: Move the probe_read_str to a separate function > perf augmented_raw_syscalls: Change helper to consider just the augmented_filename part > perf augmented_raw_syscalls: Move reading filename to the loop > perf trace: Consume the augmented_raw_syscalls payload > perf trace: Associate more argument names with the filename beautifier > perf config: Bail out when a handler returns failure for a key-value pair > perf data: Fix perf.data documentation for HEADER_CPU_TOPOLOGY > perf cs-etm: Remove duplicate GENMASK() define, use linux/bits.h instead > perf trace: Skip unknown syscalls when expanding strace like syscall groups > > Jiri Olsa (2): > perf jvmti: Address gcc string overflow warning for strncpy() > perf evsel: Remove superfluous nthreads system_wide setup in alloc_fd() > > Kan Liang (5): > perf cpumap: Retrieve die id information > perf header: Add die information in CPU topology > perf stat: Support per-die aggregation > perf header: Rename "sibling cores" to "sibling sockets" > perf tools: Apply new CPU topology sysfs attributes > > Leo Yan (3): > perf symbols: Remove unused variable 'err' > perf trace: Exit when failing to build eBPF program > perf config: Update default value for llvm.clang-bpf-cmd-template > > Mathieu Poirier (18): > perf cs-etm: Configure contextID tracing in CPU-wide mode > perf cs-etm: Configure timestamp generation in CPU-wide mode > perf cs-etm: Configure SWITCH_EVENTS in CPU-wide mode > perf cs-etm: Add handling of itrace start events > perf cs-etm: Add handling of switch-CPU-wide events > perf cs-etm: Refactor error path in cs_etm_decoder__new() > perf cs-etm: Move packet queue out of decoder structure > perf cs-etm: Fix indentation in function cs_etm__process_decoder_queue() > perf cs-etm: Introduce the concept of trace ID queues > perf cs-etm: Get rid of unused cpu in struct cs_etm_queue > perf cs-etm: Move thread to traceid_queue > perf cs-etm: Move tid/pid to traceid_queue > perf cs-etm: Use traceID aware memory callback API > perf cs-etm: Add support for multiple traceID queues > perf cs-etm: Linking PE contextID with perf thread mechanic > perf cs-etm: Add notion of time to decoding code > perf cs-etm: Add support for CPU-wide trace scenarios > perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode > > Song Liu (1): > perf data: Add description of header HEADER_BPF_PROG_INFO and HEADER_BPF_BTF > > Thomas Richter (3): > perf test 6: Fix missing kvm module load for s390 > perf report: Fix OOM error in TUI mode on s390 > perf report: Support s390 diag event display on x86 > > yuzhoujian (1): > perf record: Add support to collect callchains from kernel or user space only > > tools/perf/Documentation/db-export.txt | 41 + > tools/perf/Documentation/intel-pt.txt | 30 + > tools/perf/Documentation/perf-config.txt | 9 +- > tools/perf/Documentation/perf-diff.txt | 14 +- > tools/perf/Documentation/perf-record.txt | 11 + > tools/perf/Documentation/perf-report.txt | 9 +- > tools/perf/Documentation/perf-script.txt | 14 +- > tools/perf/Documentation/perf-stat.txt | 10 + > tools/perf/Documentation/perf.data-file-format.txt | 97 +- > tools/perf/Makefile.config | 3 + > tools/perf/arch/arm/util/cs-etm.c | 313 +++++- > tools/perf/builtin-record.c | 4 + > tools/perf/builtin-report.c | 8 +- > tools/perf/builtin-script.c | 31 +- > tools/perf/builtin-stat.c | 87 +- > tools/perf/builtin-trace.c | 84 +- > tools/perf/examples/bpf/augmented_raw_syscalls.c | 281 ++---- > tools/perf/jvmti/libjvmti.c | 4 +- > tools/perf/perf.h | 2 + > tools/perf/scripts/python/export-to-postgresql.py | 36 +- > tools/perf/scripts/python/export-to-sqlite.py | 36 +- > tools/perf/scripts/python/exported-sql-viewer.py | 294 ++++-- > tools/perf/tests/Build | 1 + > tools/perf/tests/builtin-test.c | 4 + > tools/perf/tests/parse-events.c | 27 + > tools/perf/tests/tests.h | 1 + > tools/perf/tests/time-utils-test.c | 251 +++++ > tools/perf/util/annotate.c | 5 +- > tools/perf/util/auxtrace.h | 34 + > tools/perf/util/config.c | 8 +- > tools/perf/util/cpumap.c | 64 +- > tools/perf/util/cpumap.h | 10 +- > tools/perf/util/cputopo.c | 84 +- > tools/perf/util/cputopo.h | 2 + > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 268 +++-- > tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 39 +- > tools/perf/util/cs-etm.c | 1026 +++++++++++++++----- > tools/perf/util/cs-etm.h | 94 ++ > tools/perf/util/env.c | 1 + > tools/perf/util/env.h | 3 + > tools/perf/util/event.h | 2 + > tools/perf/util/evsel.c | 16 +- > tools/perf/util/header.c | 96 +- > .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 329 ++++++- > .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 6 + > tools/perf/util/intel-pt.c | 354 ++++++- > tools/perf/util/perf_regs.h | 4 + > tools/perf/util/s390-cpumsf.c | 96 +- > .../util/scripting-engines/trace-event-python.c | 8 +- > tools/perf/util/smt.c | 8 +- > tools/perf/util/stat-display.c | 29 +- > tools/perf/util/stat-shadow.c | 1 + > tools/perf/util/stat.c | 1 + > tools/perf/util/stat.h | 1 + > tools/perf/util/symbol-elf.c | 3 +- > tools/perf/util/thread-stack.c | 14 + > tools/perf/util/thread-stack.h | 4 + > tools/perf/util/time-utils.c | 132 ++- > 58 files changed, 3581 insertions(+), 863 deletions(-) > create mode 100644 tools/perf/Documentation/db-export.txt > create mode 100644 tools/perf/tests/time-utils-test.c Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-05-17 19:34 Arnaldo Carvalho de Melo 2019-05-18 8:27 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-05-17 19:34 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexey Budankov, Andi Kleen, Colin King, Donald Yandt, Florian Fainelli, Guo Ren, Jin Yao, Kan Liang, Mao Han, Ravi Bangoria, Stanislav Kozina, Steven Rostedt, Thomas Richter, Tzvetomir Hi Ingo, Please consider pulling, I pulled tip/perf/urgent into tip/pref/core, IIRC was just a fast forward at that point, yeap, just did it again and it still is: $ git checkout -b t tip/perf/core Branch 't' set up to track remote branch 'perf/core' from 'tip'. Switched to a new branch 't' $ git merge tip/perf/urgent Updating d15d356887e7..c7a286577d75 Fast-forward <SNIP> IIRC Jiri needs this for a pile of patches he submitted and that I'll process next, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 6b89d4c1ae8596a8c9240f169ef108704de373f2: perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking (2019-05-10 08:04:17 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.2-20190517 for you to fetch changes up to 4fc4d8dfa056dfd48afe73b9ea3b7570ceb80b9c: perf stat: Support 'percore' event qualifier (2019-05-16 14:17:24 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf.data: Alexey Budankov: - Streaming compression of perf ring buffer into PERF_RECORD_COMPRESSED user space records, resulting in ~3-5x perf.data file size reduction on variety of tested workloads what saves storage space on larger server systems where perf.data size can easily reach several tens or even hundreds of GiBs, especially when profiling with DWARF-based stacks and tracing of context switches. perf record: Arnaldo Carvalho de Melo - Improve -user-regs/intr-regs suggestions to overcome errors. perf annotate: Jin Yao: - Remove hist__account_cycles() from callback, speeding up branch processing (perf record -b). perf stat: - Add a 'percore' event qualifier, e.g.: -e cpu/event=0,umask=0x3,percore=1/, that sums up the event counts for both hardware threads in a core. We can already do this with --per-core, but it's often useful to do this together with other metrics that are collected per hardware thread. I.e. now its possible to do this per-event, and have it mixed with other events not aggregated by core. core libraries: Donald Yandt: - Check for errors when doing fgets(/proc/version). Jiri Olsa: - Speed up report for perf compiled with linbunwind. tools headers: Arnaldo Carvalho de Melo - Update memcpy_64.S, x86's kvm.h and pt_regs.h. arm64: Florian Fainelli: - Map Brahma-B53 CPUID to cortex-a53 events. - Add Cortex-A57 and Cortex-A72 events. csky: Mao Han: - Add DWARF register mappings for libdw, allowing --call-graph=dwarf to work on the C-SKY arch. x86: Andi Kleen/Kan Liang: - Add support for recording and printing XMM registers, available, for instance, on Icelake. Kan Liang: - Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON support. UPI replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP. Intel PT: Adrian Hunter . Fix instructions sampling rate. . Timestamp fixes. . Improve exported-sql-viewer GUI, allowing, for instance, to copy'n'paste the trees, useful for e-mailing. Documentation: Thomas Richter: - Add description for 'perf --debug stderr=1', which redirects stderr to stdout. libtraceevent: Tzvetomir Stoyanov: - Add man pages for the various APIs. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (9): perf scripts python: exported-sql-viewer.py: Move view creation perf scripts python: exported-sql-viewer.py: Fix error when shrinking / enlarging font perf scripts python: exported-sql-viewer.py: Add tree level perf scripts python: exported-sql-viewer.py: Add copy to clipboard perf scripts python: exported-sql-viewer.py: Add context menu perf scripts python: exported-sql-viewer.py: Add 'About' dialog box perf intel-pt: Fix instructions sampling rate perf intel-pt: Fix improved sample timestamp perf intel-pt: Fix sample timestamp wrt non-taken branches Alexey Budankov (11): perf session: Define 'bytes_transferred' and 'bytes_compressed' metrics perf record: Implement COMPRESSED event record and its attributes perf mmap: Implement dedicated memory buffer for data compression perf tools: Introduce Zstd streaming based compression API perf record: Implement compression for serial trace streaming perf record: Implement compression for AIO trace streaming perf report: Add stub processing of compressed events for -D perf record: Implement -z,--compression_level[=<n>] option perf report: Implement perf.data record decompression perf inject: Enable COMPRESSED record decompression perf tests: Implement Zstd comp/decomp integration test Andi Kleen (1): perf tools x86: Add support for recording and printing XMM registers Arnaldo Carvalho de Melo (8): tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy' tools arch uapi: Sync the x86 kvm.h copy tools x86 uapi asm: Sync the pt_regs.h copy with the kernel sources tools pci: Do not delete pcitest.sh in 'make clean' perf record: Fix suggestion to get list of registers usable with --user-regs and --intr-regs perf parse-regs: Improve error output when faced with unknown register name perf build tests: Add NO_LIBZSTD=1 to make_minimal perf test zstd: Fixup verbose mode output Colin Ian King (1): perf test: Fix spelling mistake "leadking" -> "leaking" Donald Yandt (1): perf machine: Null-terminate version char array upon fgets(/proc/version) error Florian Fainelli (3): perf vendor events arm64: Remove [[:xdigit:]] wildcard perf vendor events arm64: Map Brahma-B53 CPUID to cortex-a53 events perf vendor events arm64: Add Cortex-A57 and Cortex-A72 events Jin Yao (4): perf annotate: Remove hist__account_cycles() from callback perf tools: Add a 'percore' event qualifier perf stat: Factor out aggregate counts printing perf stat: Support 'percore' event qualifier Jiri Olsa (1): perf tools: Speed up report for perf compiled with linwunwind Kan Liang (4): perf vendor events intel: Add uncore_upi JSON support perf parse-regs: Split parse_regs perf parse-regs: Add generic support for arch__intr/user_reg_mask() perf regs x86: Add X86 specific arch__intr_reg_mask() Mao Han (1): csky: Add support for libdw Thomas Richter (1): perf docs: Add description for stderr Tzvetomir Stoyanov (27): tools lib traceevent: Remove hard coded install paths from pkg-config file tools lib traceevent: Introduce man pages tools lib traceevent: Add support for man pages with multiple names tools lib traceevent: Man pages for tep_handler related APIs tools lib traceevent: Man page for header_page APIs tools lib traceevent: Man page for get/set cpus APIs tools lib traceevent: Man page for file endian APIs tools lib traceevent: Man page for host endian APIs tools lib traceevent: Man page for page size APIs tools lib traceevent: Man page for tep_strerror() tools lib traceevent: Man pages for event handler APIs tools lib traceevent: Man pages for function related libtraceevent APIs tools lib traceevent: Man pages for registering print function tools lib traceevent: Man page for tep_read_number() tools lib traceevent: Man pages for event find APIs tools lib traceevent: Man page for list events APIs tools lib traceevent: Man pages for libtraceevent event get APIs tools lib traceevent: Man pages for find field APIs tools lib traceevent: Man pages for get field value APIs tools lib traceevent: Man pages for print field APIs tools lib traceevent: Man page for tep_read_number_field() tools lib traceevent: Man pages for event fields APIs tools lib traceevent: Man pages for event filter APIs tools lib traceevent: Man pages for parse event APIs tools lib traceevent: Man page for tep_parse_header_page() tools lib traceevent: Man pages for APIs used to extract common fields from a record tools lib traceevent: Man pages for trace sequences APIs Zenghui Yu (1): perf jevents: Remove unused variable tools/arch/csky/include/uapi/asm/perf_regs.h | 51 ++++ tools/arch/x86/include/uapi/asm/kvm.h | 1 + tools/arch/x86/include/uapi/asm/perf_regs.h | 23 +- tools/arch/x86/lib/memcpy_64.S | 3 +- tools/lib/traceevent/Documentation/Makefile | 207 +++++++++++++ tools/lib/traceevent/Documentation/asciidoc.conf | 120 ++++++++ .../Documentation/libtraceevent-commands.txt | 153 ++++++++++ .../Documentation/libtraceevent-cpus.txt | 77 +++++ .../Documentation/libtraceevent-endian_read.txt | 78 +++++ .../Documentation/libtraceevent-event_find.txt | 103 +++++++ .../Documentation/libtraceevent-event_get.txt | 99 ++++++ .../Documentation/libtraceevent-event_list.txt | 122 ++++++++ .../Documentation/libtraceevent-field_find.txt | 118 +++++++ .../Documentation/libtraceevent-field_get_val.txt | 122 ++++++++ .../Documentation/libtraceevent-field_print.txt | 126 ++++++++ .../Documentation/libtraceevent-field_read.txt | 81 +++++ .../Documentation/libtraceevent-fields.txt | 105 +++++++ .../Documentation/libtraceevent-file_endian.txt | 91 ++++++ .../Documentation/libtraceevent-filter.txt | 209 +++++++++++++ .../Documentation/libtraceevent-func_apis.txt | 183 +++++++++++ .../Documentation/libtraceevent-func_find.txt | 88 ++++++ .../Documentation/libtraceevent-handle.txt | 101 ++++++ .../Documentation/libtraceevent-header_page.txt | 102 +++++++ .../Documentation/libtraceevent-host_endian.txt | 104 +++++++ .../Documentation/libtraceevent-long_size.txt | 78 +++++ .../Documentation/libtraceevent-page_size.txt | 82 +++++ .../Documentation/libtraceevent-parse_event.txt | 90 ++++++ .../Documentation/libtraceevent-parse_head.txt | 82 +++++ .../Documentation/libtraceevent-record_parse.txt | 137 +++++++++ .../libtraceevent-reg_event_handler.txt | 156 ++++++++++ .../Documentation/libtraceevent-reg_print_func.txt | 155 ++++++++++ .../Documentation/libtraceevent-set_flag.txt | 104 +++++++ .../Documentation/libtraceevent-strerror.txt | 85 ++++++ .../Documentation/libtraceevent-tseq.txt | 158 ++++++++++ .../lib/traceevent/Documentation/libtraceevent.txt | 203 ++++++++++++ .../lib/traceevent/Documentation/manpage-1.72.xsl | 14 + .../lib/traceevent/Documentation/manpage-base.xsl | 35 +++ .../Documentation/manpage-bold-literal.xsl | 17 ++ .../traceevent/Documentation/manpage-normal.xsl | 13 + .../Documentation/manpage-suppress-sp.xsl | 21 ++ tools/lib/traceevent/Makefile | 46 ++- tools/lib/traceevent/libtraceevent.pc.template | 4 +- tools/pci/Makefile | 4 +- tools/perf/Documentation/perf-list.txt | 12 + tools/perf/Documentation/perf-record.txt | 8 +- tools/perf/Documentation/perf-stat.txt | 4 + tools/perf/Documentation/perf.data-file-format.txt | 24 ++ tools/perf/Documentation/perf.txt | 2 + tools/perf/Makefile.config | 6 +- tools/perf/arch/csky/Build | 1 + tools/perf/arch/csky/Makefile | 3 + tools/perf/arch/csky/include/perf_regs.h | 100 ++++++ tools/perf/arch/csky/util/Build | 2 + tools/perf/arch/csky/util/dwarf-regs.c | 49 +++ tools/perf/arch/csky/util/unwind-libdw.c | 77 +++++ tools/perf/arch/x86/include/perf_regs.h | 26 +- tools/perf/arch/x86/util/perf_regs.c | 44 +++ tools/perf/builtin-annotate.c | 4 +- tools/perf/builtin-inject.c | 4 + tools/perf/builtin-record.c | 229 ++++++++++++-- tools/perf/builtin-report.c | 16 +- tools/perf/builtin-stat.c | 21 ++ tools/perf/perf.h | 1 + .../arm64/arm/cortex-a57-a72/core-imp-def.json | 179 +++++++++++ tools/perf/pmu-events/arch/arm64/mapfile.csv | 5 +- tools/perf/pmu-events/jevents.c | 2 +- tools/perf/scripts/python/exported-sql-viewer.py | 340 ++++++++++++++++++++- tools/perf/tests/dso-data.c | 4 +- tools/perf/tests/make | 2 +- tools/perf/tests/shell/record+zstd_comp_decomp.sh | 34 +++ tools/perf/util/Build | 2 + tools/perf/util/annotate.c | 2 +- tools/perf/util/compress.h | 53 ++++ tools/perf/util/env.h | 11 + tools/perf/util/event.c | 1 + tools/perf/util/event.h | 7 + tools/perf/util/evlist.c | 8 +- tools/perf/util/evlist.h | 2 +- tools/perf/util/evsel.c | 2 + tools/perf/util/evsel.h | 3 + tools/perf/util/header.c | 53 ++++ tools/perf/util/header.h | 1 + .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 31 +- tools/perf/util/machine.c | 3 +- tools/perf/util/mmap.c | 102 ++----- tools/perf/util/mmap.h | 16 +- tools/perf/util/parse-events.c | 27 ++ tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.l | 1 + tools/perf/util/parse-regs-options.c | 33 +- tools/perf/util/parse-regs-options.h | 3 +- tools/perf/util/perf_regs.c | 10 + tools/perf/util/perf_regs.h | 3 + tools/perf/util/session.c | 133 +++++++- tools/perf/util/session.h | 14 + tools/perf/util/stat-display.c | 107 +++++-- tools/perf/util/stat.c | 8 +- tools/perf/util/thread.c | 3 +- tools/perf/util/tool.h | 2 + tools/perf/util/unwind-libunwind-local.c | 6 - tools/perf/util/unwind-libunwind.c | 10 + tools/perf/util/zstd.c | 111 +++++++ 102 files changed, 5703 insertions(+), 216 deletions(-) create mode 100644 tools/arch/csky/include/uapi/asm/perf_regs.h create mode 100644 tools/lib/traceevent/Documentation/Makefile create mode 100644 tools/lib/traceevent/Documentation/asciidoc.conf create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-commands.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-cpus.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-endian_read.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_find.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_get.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_list.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-field_find.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-field_get_val.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-field_print.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-field_read.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-fields.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-file_endian.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-filter.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-func_apis.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-func_find.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-handle.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-header_page.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-host_endian.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-long_size.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-page_size.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-parse_event.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-parse_head.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-record_parse.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-reg_event_handler.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-reg_print_func.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-set_flag.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-strerror.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-tseq.txt create mode 100644 tools/lib/traceevent/Documentation/libtraceevent.txt create mode 100644 tools/lib/traceevent/Documentation/manpage-1.72.xsl create mode 100644 tools/lib/traceevent/Documentation/manpage-base.xsl create mode 100644 tools/lib/traceevent/Documentation/manpage-bold-literal.xsl create mode 100644 tools/lib/traceevent/Documentation/manpage-normal.xsl create mode 100644 tools/lib/traceevent/Documentation/manpage-suppress-sp.xsl create mode 100644 tools/perf/arch/csky/Build create mode 100644 tools/perf/arch/csky/Makefile create mode 100644 tools/perf/arch/csky/include/perf_regs.h create mode 100644 tools/perf/arch/csky/util/Build create mode 100644 tools/perf/arch/csky/util/dwarf-regs.c create mode 100644 tools/perf/arch/csky/util/unwind-libdw.c create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a57-a72/core-imp-def.json create mode 100755 tools/perf/tests/shell/record+zstd_comp_decomp.sh create mode 100644 tools/perf/util/zstd.c Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. $ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.1.0.tar.xz $ dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0 6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0 7 alpine:edge : Ok gcc (Alpine 8.3.0) 8.3.0 8 amazonlinux:1 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28) 9 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) 10 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 11 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 13 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 14 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) 15 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 9.0.1 20190501 (prerelease) gcc-8-branch@270761 16 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2 17 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 18 debian:experimental : Ok gcc (Debian 8.3.0-7) 8.3.0 19 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 20 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 21 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.3.0-7) 8.3.0 22 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.3.0-7) 8.3.0 23 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 24 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) 25 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) 26 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1) 27 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 28 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1) 29 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2) 30 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6) 31 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2) 32 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2) 33 fedora:30 : Ok gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1) 34 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 35 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 36 fedora:rawhide : Ok gcc (GCC) 9.0.1 20190418 (Red Hat 9.0.1-0.14) 37 mageia:5 : Ok gcc (GCC) 4.9.2 38 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0 39 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.0 40 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0 41 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5 42 opensuse:tumbleweed : Ok gcc (SUSE Linux) 8.3.1 20190226 [gcc-8-branch revision 269204] 43 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 44 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1) 45 ubuntu:12.04.5 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 46 ubuntu:14.04.4 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 47 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609 48 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 49 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 50 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 51 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 52 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 53 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 54 ubuntu:17.10 : Ok gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0 55 ubuntu:18.04 : Ok gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 56 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04) 7.4.0 57 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04) 7.4.0 58 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 59 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 60 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 61 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 62 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 63 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 64 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 65 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0 66 ubuntu:18.10 : Ok gcc (Ubuntu 8.2.0-7ubuntu1) 8.2.0 67 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 68 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 69 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 70 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 The getname_flags related tests failing at the end (tests 65, 66 and 67) are being investigated, getname_flags() seems to have become just a tail call from getname(), something in this are changed and we're not anymore being able to add a probe at a suitable place to collect the just copied from userspace pathname. # uname -a Linux quaco 5.1.0-rc7+ #1 SMP Thu May 2 09:47:59 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 4fc4d8dfa056 perf stat: Support 'percore' event qualifier # perf version --build-options perf version 5.1.g4fc4d8 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : Ok 26: Sample parsing : Ok 27: Use a dummy software event to keep tracking : Ok 28: Parse with no sample_id_all bit set : Ok 29: Filter hist entries : Ok 30: Lookup mmap thread : Ok 31: Share thread mg : Ok 32: Sort output of hist entries : Ok 33: Cumulate child hist entries : Ok 34: Track with sched_switch : Ok 35: Filter fds with revents mask in a fdarray : Ok 36: Add fd to a fdarray, making it autogrow : Ok 37: kmod_path__parse : Ok 38: Thread map : Ok 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 40: Session topology : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok 42: Synthesize thread map : Ok 43: Remove thread map : Ok 44: Synthesize cpu map : Ok 45: Synthesize stat config : Ok 46: Synthesize stat : Ok 47: Synthesize stat round : Ok 48: Synthesize attr update : Ok 49: Event times : Ok 50: Read backward ring buffer : Ok 51: Print cpu map : Ok 52: Probe SDT events : Ok 53: is_printable_array : Ok 54: Print bitmap : Ok 55: perf hooks : Ok 56: builtin clang support : Skip (not compiled in) 57: unit_number__scnprintf : Ok 58: mem2node : Ok 59: x86 rdpmc : Ok 60: Convert perf time to TSC : Ok 61: DWARF unwind : Ok 62: x86 instruction decoder - new instructions : Ok 63: x86 bp modify : Ok 64: probe libc's inet_pton & backtrace it with ping : Ok 65: Use vfs_getname probe to get syscall args filenames : FAILED! 66: Add vfs_getname probe to get syscall args filenames : FAILED! 67: Check open filename arg using perf trace + vfs_getname: FAILED! 68: Zstd perf.data compression/decompression : Ok $ time make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_doc_O: make doc make_cscope_O: make cscope make_no_newt_O: make NO_NEWT=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_demangle_O: make NO_DEMANGLE=1 make_debug_O: make DEBUG=1 make_no_libelf_O: make NO_LIBELF=1 make_no_gtk2_O: make NO_GTK2=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_perf_o_O: make perf.o make_install_O: make install make_pure_O: make make_util_map_o_O: make util/map.o make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_static_O: make LDFLAGS=-static make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_libaudit_O: make NO_LIBAUDIT=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_libbionic_O: make NO_LIBBIONIC=1 make_no_libperl_O: make NO_LIBPERL=1 make_clean_all_O: make clean all make_util_pmu_bison_o_O: make util/pmu-bison.o make_with_babeltrace_O: make LIBBABELTRACE=1 make_no_slang_O: make NO_SLANG=1 make_install_prefix_O: make install prefix=/tmp/krava make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 make_help_O: make help make_no_libbpf_O: make NO_LIBBPF=1 make_install_bin_O: make install-bin make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_tags_O: make tags OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-05-17 19:34 Arnaldo Carvalho de Melo @ 2019-05-18 8:27 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-05-18 8:27 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Thomas Gleixner, Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter, Alexey Budankov, Andi Kleen, Colin King, Donald Yandt, Florian Fainelli, Guo Ren, Jin Yao, Kan Liang, Mao Han, Ravi Bangoria, Stanislav Kozina, Steven Rostedt, Thomas Richter, Tzvetomir * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, I pulled tip/perf/urgent into > tip/pref/core, IIRC was just a fast forward at that point, yeap, just > did it again and it still is: > > $ git checkout -b t tip/perf/core > Branch 't' set up to track remote branch 'perf/core' from 'tip'. > Switched to a new branch 't' > $ git merge tip/perf/urgent > Updating d15d356887e7..c7a286577d75 > Fast-forward > <SNIP> > > IIRC Jiri needs this for a pile of patches he submitted and > that I'll process next, > > Best regards, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 6b89d4c1ae8596a8c9240f169ef108704de373f2: > > perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking (2019-05-10 08:04:17 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.2-20190517 > > for you to fetch changes up to 4fc4d8dfa056dfd48afe73b9ea3b7570ceb80b9c: > > perf stat: Support 'percore' event qualifier (2019-05-16 14:17:24 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf.data: > > Alexey Budankov: > > - Streaming compression of perf ring buffer into PERF_RECORD_COMPRESSED > user space records, resulting in ~3-5x perf.data file size reduction > on variety of tested workloads what saves storage space on larger > server systems where perf.data size can easily reach several tens or > even hundreds of GiBs, especially when profiling with DWARF-based > stacks and tracing of context switches. > > perf record: > > Arnaldo Carvalho de Melo > > - Improve -user-regs/intr-regs suggestions to overcome errors. > > perf annotate: > > Jin Yao: > > - Remove hist__account_cycles() from callback, speeding up branch processing > (perf record -b). > > perf stat: > > - Add a 'percore' event qualifier, e.g.: -e cpu/event=0,umask=0x3,percore=1/, > that sums up the event counts for both hardware threads in a core. > > We can already do this with --per-core, but it's often useful to do > this together with other metrics that are collected per hardware thread. > > I.e. now its possible to do this per-event, and have it mixed with other > events not aggregated by core. > > core libraries: > > Donald Yandt: > > - Check for errors when doing fgets(/proc/version). > > Jiri Olsa: > > - Speed up report for perf compiled with linbunwind. > > tools headers: > > Arnaldo Carvalho de Melo > > - Update memcpy_64.S, x86's kvm.h and pt_regs.h. > > arm64: > > Florian Fainelli: > > - Map Brahma-B53 CPUID to cortex-a53 events. > > - Add Cortex-A57 and Cortex-A72 events. > > csky: > > Mao Han: > > - Add DWARF register mappings for libdw, allowing --call-graph=dwarf to work > on the C-SKY arch. > > x86: > > Andi Kleen/Kan Liang: > > - Add support for recording and printing XMM registers, available, for > instance, on Icelake. > > Kan Liang: > > - Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON support. > UPI replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP. > > Intel PT: > > Adrian Hunter > > . Fix instructions sampling rate. > > . Timestamp fixes. > > . Improve exported-sql-viewer GUI, allowing, for instance, to copy'n'paste > the trees, useful for e-mailing. > > Documentation: > > Thomas Richter: > > - Add description for 'perf --debug stderr=1', which redirects stderr to stdout. > > libtraceevent: > > Tzvetomir Stoyanov: > > - Add man pages for the various APIs. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Adrian Hunter (9): > perf scripts python: exported-sql-viewer.py: Move view creation > perf scripts python: exported-sql-viewer.py: Fix error when shrinking / enlarging font > perf scripts python: exported-sql-viewer.py: Add tree level > perf scripts python: exported-sql-viewer.py: Add copy to clipboard > perf scripts python: exported-sql-viewer.py: Add context menu > perf scripts python: exported-sql-viewer.py: Add 'About' dialog box > perf intel-pt: Fix instructions sampling rate > perf intel-pt: Fix improved sample timestamp > perf intel-pt: Fix sample timestamp wrt non-taken branches > > Alexey Budankov (11): > perf session: Define 'bytes_transferred' and 'bytes_compressed' metrics > perf record: Implement COMPRESSED event record and its attributes > perf mmap: Implement dedicated memory buffer for data compression > perf tools: Introduce Zstd streaming based compression API > perf record: Implement compression for serial trace streaming > perf record: Implement compression for AIO trace streaming > perf report: Add stub processing of compressed events for -D > perf record: Implement -z,--compression_level[=<n>] option > perf report: Implement perf.data record decompression > perf inject: Enable COMPRESSED record decompression > perf tests: Implement Zstd comp/decomp integration test > > Andi Kleen (1): > perf tools x86: Add support for recording and printing XMM registers > > Arnaldo Carvalho de Melo (8): > tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy' > tools arch uapi: Sync the x86 kvm.h copy > tools x86 uapi asm: Sync the pt_regs.h copy with the kernel sources > tools pci: Do not delete pcitest.sh in 'make clean' > perf record: Fix suggestion to get list of registers usable with --user-regs and --intr-regs > perf parse-regs: Improve error output when faced with unknown register name > perf build tests: Add NO_LIBZSTD=1 to make_minimal > perf test zstd: Fixup verbose mode output > > Colin Ian King (1): > perf test: Fix spelling mistake "leadking" -> "leaking" > > Donald Yandt (1): > perf machine: Null-terminate version char array upon fgets(/proc/version) error > > Florian Fainelli (3): > perf vendor events arm64: Remove [[:xdigit:]] wildcard > perf vendor events arm64: Map Brahma-B53 CPUID to cortex-a53 events > perf vendor events arm64: Add Cortex-A57 and Cortex-A72 events > > Jin Yao (4): > perf annotate: Remove hist__account_cycles() from callback > perf tools: Add a 'percore' event qualifier > perf stat: Factor out aggregate counts printing > perf stat: Support 'percore' event qualifier > > Jiri Olsa (1): > perf tools: Speed up report for perf compiled with linwunwind > > Kan Liang (4): > perf vendor events intel: Add uncore_upi JSON support > perf parse-regs: Split parse_regs > perf parse-regs: Add generic support for arch__intr/user_reg_mask() > perf regs x86: Add X86 specific arch__intr_reg_mask() > > Mao Han (1): > csky: Add support for libdw > > Thomas Richter (1): > perf docs: Add description for stderr > > Tzvetomir Stoyanov (27): > tools lib traceevent: Remove hard coded install paths from pkg-config file > tools lib traceevent: Introduce man pages > tools lib traceevent: Add support for man pages with multiple names > tools lib traceevent: Man pages for tep_handler related APIs > tools lib traceevent: Man page for header_page APIs > tools lib traceevent: Man page for get/set cpus APIs > tools lib traceevent: Man page for file endian APIs > tools lib traceevent: Man page for host endian APIs > tools lib traceevent: Man page for page size APIs > tools lib traceevent: Man page for tep_strerror() > tools lib traceevent: Man pages for event handler APIs > tools lib traceevent: Man pages for function related libtraceevent APIs > tools lib traceevent: Man pages for registering print function > tools lib traceevent: Man page for tep_read_number() > tools lib traceevent: Man pages for event find APIs > tools lib traceevent: Man page for list events APIs > tools lib traceevent: Man pages for libtraceevent event get APIs > tools lib traceevent: Man pages for find field APIs > tools lib traceevent: Man pages for get field value APIs > tools lib traceevent: Man pages for print field APIs > tools lib traceevent: Man page for tep_read_number_field() > tools lib traceevent: Man pages for event fields APIs > tools lib traceevent: Man pages for event filter APIs > tools lib traceevent: Man pages for parse event APIs > tools lib traceevent: Man page for tep_parse_header_page() > tools lib traceevent: Man pages for APIs used to extract common fields from a record > tools lib traceevent: Man pages for trace sequences APIs > > Zenghui Yu (1): > perf jevents: Remove unused variable > > tools/arch/csky/include/uapi/asm/perf_regs.h | 51 ++++ > tools/arch/x86/include/uapi/asm/kvm.h | 1 + > tools/arch/x86/include/uapi/asm/perf_regs.h | 23 +- > tools/arch/x86/lib/memcpy_64.S | 3 +- > tools/lib/traceevent/Documentation/Makefile | 207 +++++++++++++ > tools/lib/traceevent/Documentation/asciidoc.conf | 120 ++++++++ > .../Documentation/libtraceevent-commands.txt | 153 ++++++++++ > .../Documentation/libtraceevent-cpus.txt | 77 +++++ > .../Documentation/libtraceevent-endian_read.txt | 78 +++++ > .../Documentation/libtraceevent-event_find.txt | 103 +++++++ > .../Documentation/libtraceevent-event_get.txt | 99 ++++++ > .../Documentation/libtraceevent-event_list.txt | 122 ++++++++ > .../Documentation/libtraceevent-field_find.txt | 118 +++++++ > .../Documentation/libtraceevent-field_get_val.txt | 122 ++++++++ > .../Documentation/libtraceevent-field_print.txt | 126 ++++++++ > .../Documentation/libtraceevent-field_read.txt | 81 +++++ > .../Documentation/libtraceevent-fields.txt | 105 +++++++ > .../Documentation/libtraceevent-file_endian.txt | 91 ++++++ > .../Documentation/libtraceevent-filter.txt | 209 +++++++++++++ > .../Documentation/libtraceevent-func_apis.txt | 183 +++++++++++ > .../Documentation/libtraceevent-func_find.txt | 88 ++++++ > .../Documentation/libtraceevent-handle.txt | 101 ++++++ > .../Documentation/libtraceevent-header_page.txt | 102 +++++++ > .../Documentation/libtraceevent-host_endian.txt | 104 +++++++ > .../Documentation/libtraceevent-long_size.txt | 78 +++++ > .../Documentation/libtraceevent-page_size.txt | 82 +++++ > .../Documentation/libtraceevent-parse_event.txt | 90 ++++++ > .../Documentation/libtraceevent-parse_head.txt | 82 +++++ > .../Documentation/libtraceevent-record_parse.txt | 137 +++++++++ > .../libtraceevent-reg_event_handler.txt | 156 ++++++++++ > .../Documentation/libtraceevent-reg_print_func.txt | 155 ++++++++++ > .../Documentation/libtraceevent-set_flag.txt | 104 +++++++ > .../Documentation/libtraceevent-strerror.txt | 85 ++++++ > .../Documentation/libtraceevent-tseq.txt | 158 ++++++++++ > .../lib/traceevent/Documentation/libtraceevent.txt | 203 ++++++++++++ > .../lib/traceevent/Documentation/manpage-1.72.xsl | 14 + > .../lib/traceevent/Documentation/manpage-base.xsl | 35 +++ > .../Documentation/manpage-bold-literal.xsl | 17 ++ > .../traceevent/Documentation/manpage-normal.xsl | 13 + > .../Documentation/manpage-suppress-sp.xsl | 21 ++ > tools/lib/traceevent/Makefile | 46 ++- > tools/lib/traceevent/libtraceevent.pc.template | 4 +- > tools/pci/Makefile | 4 +- > tools/perf/Documentation/perf-list.txt | 12 + > tools/perf/Documentation/perf-record.txt | 8 +- > tools/perf/Documentation/perf-stat.txt | 4 + > tools/perf/Documentation/perf.data-file-format.txt | 24 ++ > tools/perf/Documentation/perf.txt | 2 + > tools/perf/Makefile.config | 6 +- > tools/perf/arch/csky/Build | 1 + > tools/perf/arch/csky/Makefile | 3 + > tools/perf/arch/csky/include/perf_regs.h | 100 ++++++ > tools/perf/arch/csky/util/Build | 2 + > tools/perf/arch/csky/util/dwarf-regs.c | 49 +++ > tools/perf/arch/csky/util/unwind-libdw.c | 77 +++++ > tools/perf/arch/x86/include/perf_regs.h | 26 +- > tools/perf/arch/x86/util/perf_regs.c | 44 +++ > tools/perf/builtin-annotate.c | 4 +- > tools/perf/builtin-inject.c | 4 + > tools/perf/builtin-record.c | 229 ++++++++++++-- > tools/perf/builtin-report.c | 16 +- > tools/perf/builtin-stat.c | 21 ++ > tools/perf/perf.h | 1 + > .../arm64/arm/cortex-a57-a72/core-imp-def.json | 179 +++++++++++ > tools/perf/pmu-events/arch/arm64/mapfile.csv | 5 +- > tools/perf/pmu-events/jevents.c | 2 +- > tools/perf/scripts/python/exported-sql-viewer.py | 340 ++++++++++++++++++++- > tools/perf/tests/dso-data.c | 4 +- > tools/perf/tests/make | 2 +- > tools/perf/tests/shell/record+zstd_comp_decomp.sh | 34 +++ > tools/perf/util/Build | 2 + > tools/perf/util/annotate.c | 2 +- > tools/perf/util/compress.h | 53 ++++ > tools/perf/util/env.h | 11 + > tools/perf/util/event.c | 1 + > tools/perf/util/event.h | 7 + > tools/perf/util/evlist.c | 8 +- > tools/perf/util/evlist.h | 2 +- > tools/perf/util/evsel.c | 2 + > tools/perf/util/evsel.h | 3 + > tools/perf/util/header.c | 53 ++++ > tools/perf/util/header.h | 1 + > .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 31 +- > tools/perf/util/machine.c | 3 +- > tools/perf/util/mmap.c | 102 ++----- > tools/perf/util/mmap.h | 16 +- > tools/perf/util/parse-events.c | 27 ++ > tools/perf/util/parse-events.h | 1 + > tools/perf/util/parse-events.l | 1 + > tools/perf/util/parse-regs-options.c | 33 +- > tools/perf/util/parse-regs-options.h | 3 +- > tools/perf/util/perf_regs.c | 10 + > tools/perf/util/perf_regs.h | 3 + > tools/perf/util/session.c | 133 +++++++- > tools/perf/util/session.h | 14 + > tools/perf/util/stat-display.c | 107 +++++-- > tools/perf/util/stat.c | 8 +- > tools/perf/util/thread.c | 3 +- > tools/perf/util/tool.h | 2 + > tools/perf/util/unwind-libunwind-local.c | 6 - > tools/perf/util/unwind-libunwind.c | 10 + > tools/perf/util/zstd.c | 111 +++++++ > 102 files changed, 5703 insertions(+), 216 deletions(-) > create mode 100644 tools/arch/csky/include/uapi/asm/perf_regs.h > create mode 100644 tools/lib/traceevent/Documentation/Makefile > create mode 100644 tools/lib/traceevent/Documentation/asciidoc.conf > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-commands.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-cpus.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-endian_read.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_find.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_get.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_list.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-field_find.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-field_get_val.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-field_print.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-field_read.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-fields.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-file_endian.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-filter.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-func_apis.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-func_find.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-handle.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-header_page.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-host_endian.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-long_size.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-page_size.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-parse_event.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-parse_head.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-record_parse.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-reg_event_handler.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-reg_print_func.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-set_flag.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-strerror.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-tseq.txt > create mode 100644 tools/lib/traceevent/Documentation/libtraceevent.txt > create mode 100644 tools/lib/traceevent/Documentation/manpage-1.72.xsl > create mode 100644 tools/lib/traceevent/Documentation/manpage-base.xsl > create mode 100644 tools/lib/traceevent/Documentation/manpage-bold-literal.xsl > create mode 100644 tools/lib/traceevent/Documentation/manpage-normal.xsl > create mode 100644 tools/lib/traceevent/Documentation/manpage-suppress-sp.xsl > create mode 100644 tools/perf/arch/csky/Build > create mode 100644 tools/perf/arch/csky/Makefile > create mode 100644 tools/perf/arch/csky/include/perf_regs.h > create mode 100644 tools/perf/arch/csky/util/Build > create mode 100644 tools/perf/arch/csky/util/dwarf-regs.c > create mode 100644 tools/perf/arch/csky/util/unwind-libdw.c > create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a57-a72/core-imp-def.json > create mode 100755 tools/perf/tests/shell/record+zstd_comp_decomp.sh > create mode 100644 tools/perf/util/zstd.c Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
* [GIT PULL] perf/core improvements and fixes @ 2019-02-25 21:19 Arnaldo Carvalho de Melo 2019-02-28 7:31 ` Ingo Molnar 0 siblings, 1 reply; 133+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-02-25 21:19 UTC (permalink / raw) To: Ingo Molnar Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo, Adrian Hunter, Alexander Shishkin, Andi Kleen, Mansour Alharthi, Mathieu Poirier, Seeteena Thoufeek, Tony Jones, Wei Li Hi Ingo, Please consider pulling, this is on top of my previous pull request, perf-core-for-mingo-5.1-20190220. - Arnaldo Test results at the end of this message, as usual. The following changes since commit b4409ae112caa6315f6ee678e953b9fc93e6919c: perf tools: Make rm_rf() remove single file (2019-02-20 17:09:28 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.1-20190225 for you to fetch changes up to de667cce7f4f96b6e22da8fd9c065b961f355080: perf script python: Add Python3 support to syscall-counts-by-pid.py (2019-02-25 17:17:13 -0300) ---------------------------------------------------------------- perf/core improvements and fixes: perf annotate: Wei Li: - Fix getting source line failure. perf script: Andi Kleen: - Handle missing fields with -F +... perf data: Jiri Olsa: - Prep work to support per-cpu files in a directory. Intel PT: Adrian Hunter: - Improve thread_stack__no_call_return() - Hide x86 retpolines in thread stacks. - exported SQL viewer refactorings, new 'top calls' report. Alexander Shishkin: - Copy parent's address filter offsets on clone. - Fix address filters for vmas with non-zero offset. Applies to ARM's CoreSight as well. python scripts: Tony Jones: - Python3 support for several 'perf script' python scripts. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ---------------------------------------------------------------- Adrian Hunter (13): perf thread-stack: Improve thread_stack__no_call_return() perf thread-stack: Hide x86 retpolines perf scripts python: exported-sql-viewer.py: Fix missing shebang perf scripts python: exported-sql-viewer.py: Remove leftover debugging prints perf scripts python: exported-sql-viewer.py: Hide Call Graph option if no calls table perf scripts python: exported-sql-viewer.py: Move column headers perf scripts python: exported-sql-viewer.py: Factor out ReportDialogBase perf scripts python: exported-sql-viewer.py: Factor out ReportVars perf scripts python: exported-sql-viewer.py: Move report name into ReportVars perf scripts python: exported-sql-viewer.py: Create new dialog data item classes perf scripts python: exported-sql-viewer.py: Remove SQLTableDialogDataItem perf scripts python: exported-sql-viewer.py: Remove no selection error perf scripts python: exported-sql-viewer.py: Add top calls report Alexander Shishkin (2): perf: Copy parent's address filter offsets on clone perf, pt, coresight: Fix address filters for vmas with non-zero offset Andi Kleen (2): perf script: Handle missing fields with -F +.. perf tools: Add perf_exe() helper to find perf binary Jiri Olsa (9): perf data: Move size to struct perf_data_file perf data: Add global path holder perf tools: Add depth checking to rm_rf perf tools: Add pattern name checking to rm_rf perf tools: Add rm_rf_perf_data function perf data: Make check_backup work over directories perf data: Fail check_backup in case of error perf data: Add perf_data__(create_dir|close_dir) functions perf data: Add perf_data__open_dir_data function Tony Jones (10): perf script python: Add Python3 support to netdev-times.py perf script python: Add Python3 support to failed-syscalls-by-pid.py perf script python: Add Python3 support to mem-phys-addr.py perf script python: Add Python3 support to net_dropmonitor.py perf script python: Add Python3 support to powerpc-hcalls.py perf script python: Add Python3 support to sctop.py perf script python: Add Python3 support to stackcollapse.py perf script python: Add Python3 support to stat-cpi.py perf script python: Add Python3 support to syscall-counts.py perf script python: Add Python3 support to syscall-counts-by-pid.py Wei Li (1): perf annotate: Fix getting source line failure arch/x86/events/intel/pt.c | 9 +- drivers/hwtracing/coresight/coresight-etm-perf.c | 7 +- include/linux/perf_event.h | 7 +- kernel/events/core.c | 90 ++-- tools/perf/builtin-annotate.c | 4 +- tools/perf/builtin-buildid-cache.c | 4 +- tools/perf/builtin-buildid-list.c | 8 +- tools/perf/builtin-c2c.c | 4 +- tools/perf/builtin-diff.c | 12 +- tools/perf/builtin-evlist.c | 4 +- tools/perf/builtin-inject.c | 10 +- tools/perf/builtin-kmem.c | 2 +- tools/perf/builtin-kvm.c | 8 +- tools/perf/builtin-lock.c | 8 +- tools/perf/builtin-mem.c | 8 +- tools/perf/builtin-record.c | 11 +- tools/perf/builtin-report.c | 6 +- tools/perf/builtin-sched.c | 16 +- tools/perf/builtin-script.c | 22 +- tools/perf/builtin-stat.c | 6 +- tools/perf/builtin-timechart.c | 8 +- tools/perf/builtin-trace.c | 8 +- tools/perf/scripts/python/exported-sql-viewer.py | 510 ++++++++++++++------- .../perf/scripts/python/failed-syscalls-by-pid.py | 21 +- tools/perf/scripts/python/mem-phys-addr.py | 24 +- tools/perf/scripts/python/net_dropmonitor.py | 10 +- tools/perf/scripts/python/netdev-times.py | 82 ++-- tools/perf/scripts/python/powerpc-hcalls.py | 18 +- tools/perf/scripts/python/sctop.py | 24 +- tools/perf/scripts/python/stackcollapse.py | 7 +- tools/perf/scripts/python/stat-cpi.py | 10 +- tools/perf/scripts/python/syscall-counts-by-pid.py | 22 +- tools/perf/scripts/python/syscall-counts.py | 18 +- tools/perf/util/annotate.c | 4 +- tools/perf/util/data-convert-bt.c | 4 +- tools/perf/util/data.c | 175 ++++++- tools/perf/util/data.h | 16 +- tools/perf/util/header.c | 12 +- tools/perf/util/thread-stack.c | 161 ++++++- tools/perf/util/util.c | 65 ++- tools/perf/util/util.h | 3 + 41 files changed, 1019 insertions(+), 429 deletions(-) Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. $ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.0.0-rc5.tar.xz $ dm 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822 3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0 4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0 5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0 6 alpine:3.9 : Ok gcc (Alpine 8.2.0) 8.2.0 7 alpine:edge : Ok gcc (Alpine 8.2.0) 8.2.0 8 amazonlinux:1 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28) 9 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) 10 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 11 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 12 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 13 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 14 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) 15 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 8.2.1 20180502 16 debian:7 : Ok gcc (Debian 4.7.2-5) 4.7.2 17 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2 18 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 19 debian:experimental : Ok gcc (Debian 8.2.0-17) 8.2.1 20190204 20 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0 21 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0 22 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.2.0-16) 8.2.0 23 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 24 fedora:21 : Ok gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6) 25 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) 26 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) 27 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1) 28 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 29 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1) 30 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2) 31 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6) 32 fedora:28 : Ok gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6) 33 fedora:29 : Ok gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6) 34 fedora:30 : Ok gcc (GCC) 9.0.1 20190203 (Red Hat 9.0.1-0.3) 35 fedora:rawhide : Ok gcc (GCC) 9.0.0 20190119 (Red Hat 9.0.0-0.3) 36 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 7.3.0-r3 p1.4) 7.3.0 37 mageia:5 : Ok gcc (GCC) 4.9.2 38 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0 39 opensuse:13.2 : Ok gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064] 40 opensuse:15.0 : Ok gcc (SUSE Linux) 7.3.1 20180323 [gcc-7-branch revision 258812] 41 opensuse:15.1 : Ok gcc (SUSE Linux) 7.4.0 42 opensuse:42.1 : Ok gcc (SUSE Linux) 4.8.5 43 opensuse:42.2 : Ok gcc (SUSE Linux) 4.8.5 44 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5 45 opensuse:tumbleweed : Ok gcc (SUSE Linux) 8.2.1 20190103 [gcc-8-branch revision 267549] 46 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 47 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1) 48 ubuntu:12.04.5 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 49 ubuntu:14.04.4 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 50 ubuntu:14.04.4-x-linaro-arm64 : Ok aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0 51 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609 52 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 53 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 54 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 55 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 56 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 57 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 58 ubuntu:17.10 : Ok gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0 59 ubuntu:18.04 : Ok gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 60 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0 61 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0 62 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 63 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 64 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 65 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 66 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 67 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 68 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 69 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 70 ubuntu:18.10 : Ok gcc (Ubuntu 8.2.0-7ubuntu1) 8.2.0 71 ubuntu:19.04 : Ok gcc (Ubuntu 8.2.0-20ubuntu1) 8.2.0 72 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.2.0-20ubuntu1) 8.2.0 73 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.2.0-20ubuntu1) 8.2.0 74 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.2.0-20ubuntu1) 8.2.0 $ # uname -a Linux quaco 5.0.0-rc7+ #20 SMP Mon Feb 25 16:16:50 -03 2019 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 de667cce7f4f perf script python: Add Python3 support to syscall-counts-by-pid.py # perf version --build-options perf version 5.0.rc5.gde667c dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Test data source output : Ok 6: Parse event definition strings : Ok 7: Simple expression parser : Ok 8: PERF_RECORD_* events & perf_sample fields : Ok 9: Parse perf pmu format : Ok 10: DSO data read : Ok 11: DSO data cache : Ok 12: DSO data reopen : Ok 13: Roundtrip evsel->name : Ok 14: Parse sched tracepoints fields : Ok 15: syscalls:sys_enter_openat event fields : Ok 16: Setup struct perf_event_attr : Ok 17: Match and link multiple hists : Ok 18: 'import perf' in python : Ok 19: Breakpoint overflow signal handler : Ok 20: Breakpoint overflow sampling : Ok 21: Breakpoint accounting : Ok 22: Watchpoint : 22.1: Read Only Watchpoint : Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok 23: Number of exit events of a simple workload : Ok 24: Software clock events period values : Ok 25: Object code reading : $ make -C tools/perf build-test make: Entering directory '/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_tags_O: make tags make_help_O: make help make_install_bin_O: make install-bin make_install_prefix_slash_O: make install prefix=/tmp/krava/ make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1 make_no_libunwind_O: make NO_LIBUNWIND=1 make_cscope_O: make cscope make_util_pmu_bison_o_O: make util/pmu-bison.o make_no_libbionic_O: make NO_LIBBIONIC=1 make_install_prefix_O: make install prefix=/tmp/krava make_pure_O: make make_install_O: make install make_clean_all_O: make clean all make_no_gtk2_O: make NO_GTK2=1 make_doc_O: make doc make_no_newt_O: make NO_NEWT=1 make_no_demangle_O: make NO_DEMANGLE=1 make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 make_no_libnuma_O: make NO_LIBNUMA=1 make_no_libaudit_O: make NO_LIBAUDIT=1 make_perf_o_O: make perf.o make_no_libperl_O: make NO_LIBPERL=1 make_no_auxtrace_O: make NO_AUXTRACE=1 make_no_libelf_O: make NO_LIBELF=1 make_no_libpython_O: make NO_LIBPYTHON=1 make_no_slang_O: make NO_SLANG=1 make_no_libbpf_O: make NO_LIBBPF=1 make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_with_babeltrace_O: make LIBBABELTRACE=1 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 make_with_clangllvm_O: make LIBCLANGLLVM=1 make_no_backtrace_O: make NO_BACKTRACE=1 make_static_O: make LDFLAGS=-static make_util_map_o_O: make util/map.o make_debug_O: make DEBUG=1 OK make: Leaving directory '/home/acme/git/perf/tools/perf' $ ^ permalink raw reply [flat|nested] 133+ messages in thread
* Re: [GIT PULL] perf/core improvements and fixes 2019-02-25 21:19 Arnaldo Carvalho de Melo @ 2019-02-28 7:31 ` Ingo Molnar 0 siblings, 0 replies; 133+ messages in thread From: Ingo Molnar @ 2019-02-28 7:31 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Jiri Olsa, Namhyung Kim, Clark Williams, linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo, Adrian Hunter, Alexander Shishkin, Andi Kleen, Mansour Alharthi, Mathieu Poirier, Seeteena Thoufeek, Tony Jones, Wei Li * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, this is on top of my previous pull > request, perf-core-for-mingo-5.1-20190220. > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit b4409ae112caa6315f6ee678e953b9fc93e6919c: > > perf tools: Make rm_rf() remove single file (2019-02-20 17:09:28 -0300) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.1-20190225 > > for you to fetch changes up to de667cce7f4f96b6e22da8fd9c065b961f355080: > > perf script python: Add Python3 support to syscall-counts-by-pid.py (2019-02-25 17:17:13 -0300) > > ---------------------------------------------------------------- > perf/core improvements and fixes: > > perf annotate: > > Wei Li: > > - Fix getting source line failure. > > perf script: > > Andi Kleen: > > - Handle missing fields with -F +... > > perf data: > > Jiri Olsa: > > - Prep work to support per-cpu files in a directory. > > Intel PT: > > Adrian Hunter: > > - Improve thread_stack__no_call_return() > > - Hide x86 retpolines in thread stacks. > > - exported SQL viewer refactorings, new 'top calls' report. > > Alexander Shishkin: > > - Copy parent's address filter offsets on clone. > > - Fix address filters for vmas with non-zero offset. Applies to > ARM's CoreSight as well. > > python scripts: > > Tony Jones: > > - Python3 support for several 'perf script' python scripts. > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > ---------------------------------------------------------------- > Adrian Hunter (13): > perf thread-stack: Improve thread_stack__no_call_return() > perf thread-stack: Hide x86 retpolines > perf scripts python: exported-sql-viewer.py: Fix missing shebang > perf scripts python: exported-sql-viewer.py: Remove leftover debugging prints > perf scripts python: exported-sql-viewer.py: Hide Call Graph option if no calls table > perf scripts python: exported-sql-viewer.py: Move column headers > perf scripts python: exported-sql-viewer.py: Factor out ReportDialogBase > perf scripts python: exported-sql-viewer.py: Factor out ReportVars > perf scripts python: exported-sql-viewer.py: Move report name into ReportVars > perf scripts python: exported-sql-viewer.py: Create new dialog data item classes > perf scripts python: exported-sql-viewer.py: Remove SQLTableDialogDataItem > perf scripts python: exported-sql-viewer.py: Remove no selection error > perf scripts python: exported-sql-viewer.py: Add top calls report > > Alexander Shishkin (2): > perf: Copy parent's address filter offsets on clone > perf, pt, coresight: Fix address filters for vmas with non-zero offset > > Andi Kleen (2): > perf script: Handle missing fields with -F +.. > perf tools: Add perf_exe() helper to find perf binary > > Jiri Olsa (9): > perf data: Move size to struct perf_data_file > perf data: Add global path holder > perf tools: Add depth checking to rm_rf > perf tools: Add pattern name checking to rm_rf > perf tools: Add rm_rf_perf_data function > perf data: Make check_backup work over directories > perf data: Fail check_backup in case of error > perf data: Add perf_data__(create_dir|close_dir) functions > perf data: Add perf_data__open_dir_data function > > Tony Jones (10): > perf script python: Add Python3 support to netdev-times.py > perf script python: Add Python3 support to failed-syscalls-by-pid.py > perf script python: Add Python3 support to mem-phys-addr.py > perf script python: Add Python3 support to net_dropmonitor.py > perf script python: Add Python3 support to powerpc-hcalls.py > perf script python: Add Python3 support to sctop.py > perf script python: Add Python3 support to stackcollapse.py > perf script python: Add Python3 support to stat-cpi.py > perf script python: Add Python3 support to syscall-counts.py > perf script python: Add Python3 support to syscall-counts-by-pid.py > > Wei Li (1): > perf annotate: Fix getting source line failure > > arch/x86/events/intel/pt.c | 9 +- > drivers/hwtracing/coresight/coresight-etm-perf.c | 7 +- > include/linux/perf_event.h | 7 +- > kernel/events/core.c | 90 ++-- > tools/perf/builtin-annotate.c | 4 +- > tools/perf/builtin-buildid-cache.c | 4 +- > tools/perf/builtin-buildid-list.c | 8 +- > tools/perf/builtin-c2c.c | 4 +- > tools/perf/builtin-diff.c | 12 +- > tools/perf/builtin-evlist.c | 4 +- > tools/perf/builtin-inject.c | 10 +- > tools/perf/builtin-kmem.c | 2 +- > tools/perf/builtin-kvm.c | 8 +- > tools/perf/builtin-lock.c | 8 +- > tools/perf/builtin-mem.c | 8 +- > tools/perf/builtin-record.c | 11 +- > tools/perf/builtin-report.c | 6 +- > tools/perf/builtin-sched.c | 16 +- > tools/perf/builtin-script.c | 22 +- > tools/perf/builtin-stat.c | 6 +- > tools/perf/builtin-timechart.c | 8 +- > tools/perf/builtin-trace.c | 8 +- > tools/perf/scripts/python/exported-sql-viewer.py | 510 ++++++++++++++------- > .../perf/scripts/python/failed-syscalls-by-pid.py | 21 +- > tools/perf/scripts/python/mem-phys-addr.py | 24 +- > tools/perf/scripts/python/net_dropmonitor.py | 10 +- > tools/perf/scripts/python/netdev-times.py | 82 ++-- > tools/perf/scripts/python/powerpc-hcalls.py | 18 +- > tools/perf/scripts/python/sctop.py | 24 +- > tools/perf/scripts/python/stackcollapse.py | 7 +- > tools/perf/scripts/python/stat-cpi.py | 10 +- > tools/perf/scripts/python/syscall-counts-by-pid.py | 22 +- > tools/perf/scripts/python/syscall-counts.py | 18 +- > tools/perf/util/annotate.c | 4 +- > tools/perf/util/data-convert-bt.c | 4 +- > tools/perf/util/data.c | 175 ++++++- > tools/perf/util/data.h | 16 +- > tools/perf/util/header.c | 12 +- > tools/perf/util/thread-stack.c | 161 ++++++- > tools/perf/util/util.c | 65 ++- > tools/perf/util/util.h | 3 + > 41 files changed, 1019 insertions(+), 429 deletions(-) Pulled, thanks a lot Arnaldo! Ingo ^ permalink raw reply [flat|nested] 133+ messages in thread
end of thread, other threads:[~2020-05-06 15:21 UTC | newest] Thread overview: 133+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-11-07 18:59 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 01/63] perf data: Correctly identify directory data files Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 02/63] perf data: Move perf_dir_version into data.h Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 03/63] perf data: Rename directory "header" file to "data" Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 04/63] perf session: Fix indent in perf_session__new()" Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 05/63] perf data: Support single perf.data file directory Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 06/63] perf record: Put a copy of kcore into the perf.data directory Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 07/63] perf llvm: Make .o saving a debug message, not an info one Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 08/63] perf cs-etm: Fix definition of macro TO_CS_QUEUE_NR Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 09/63] perf evsel: Always preserve errno while cleaning up perf_event_open failures Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 10/63] perf evsel: Avoid close(-1) Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 11/63] perf tools: Move ALLOC_LIST into a function Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 12/63] perf tools: Avoid a malloc() for array events Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 13/63] perf tests: Fix a typo Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 14/63] perf kvm: Use evlist layer api when possible Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 15/63] perf probe: Fix to find range-only function instance Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 16/63] perf probe: Walk function lines in lexical blocks Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 17/63] perf probe: Fix to show function entry line as probe-able Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 18/63] perf jevents: Fix resource leak in process_mapfile() and main() Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 19/63] perf probe: Fix wrong address verification Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 20/63] perf probe: Fix to probe a function which has no entry pc Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 21/63] perf probe: Fix to probe an inline " Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 22/63] perf probe: Fix to list probe event with correct line number Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 23/63] perf probe: Fix to show inlined function callsite without entry_pc Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 24/63] perf probe: Fix to show ranges of variables in functions " Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 25/63] perf auxtrace: Add auxtrace_cache__remove() Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 26/63] perf dso: Refactor dso_cache__read() Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 27/63] perf dso: Add dso__data_write_cache_addr() Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 28/63] perf map: Check if the map still has some refcounts on exit Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 29/63] perf map: Allow map__next() to receive a NULL arg Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 30/63] perf maps: Add for_each_entry()/_safe() iterators Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 31/63] perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iterators Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 32/63] libsubcmd: Move EXTRA_FLAGS to the end to allow overriding existing flags Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 33/63] libsubcmd: Use -O0 with DEBUG=1 Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 34/63] perf tools: Splice events onto evlist even on error Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 36/63] perf vendor events intel: Update all the Intel JSON metrics from TMAM 3.6 Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 37/63] perf env: Add perf_env__numa_node() Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 38/63] perf stat: Add --per-node agregation support Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 39/63] perf tools: Fix cross compile for ARM64 Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 40/63] perf inject: Make --strip keep evsels Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 41/63] perf parse: Add parse events handle error Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 42/63] perf parse: Ensure config and str in terms are unique Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 43/63] perf parse: Add destructors for parse event terms Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 44/63] perf parse: Before yyabort-ing free components Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 45/63] perf parse: If pmu configuration fails free terms Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 46/63] perf parse: Add a deep delete for parse event terms Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 47/63] perf symbols: Remove needless checks for map->groups->machine Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 48/63] perf machine: Add kernel_dso() method Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 49/63] perf annotate: Fix heap overflow Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 50/63] perf probe: Return a better scope DIE if there is no best scope Arnaldo Carvalho de Melo 2019-11-07 18:59 ` [PATCH 51/63] perf probe: Skip end-of-sequence and non statement lines Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 52/63] perf probe: Filter out instances except for inlined subroutine and subprogram Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 53/63] perf probe: Fix to show calling lines of inlined functions Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 54/63] perf probe: Skip overlapped location on searching variables Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 55/63] perf record: Add support for limit perf output file size Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 56/63] perf tests: Fix out of bounds memory access Arnaldo Carvalho de Melo 2019-12-16 16:07 ` Naresh Kamboju 2019-12-16 16:20 ` Greg Kroah-Hartman 2019-11-07 19:00 ` [PATCH 57/63] perf diff: Don't use hack to skip column length calculation Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 58/63] perf block: Cleanup and refactor block info functions Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 59/63] perf hist: Count the total cycles of all samples Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 60/63] perf hist: Support block formats with compare/sort/display Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 61/63] perf report: Sort by sampled cycles percent per block for stdio Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 62/63] perf report: Support --percent-limit for --total-cycles Arnaldo Carvalho de Melo 2019-11-07 19:00 ` [PATCH 63/63] perf report: Sort by sampled cycles percent per block for tui Arnaldo Carvalho de Melo 2019-11-12 11:08 ` [GIT PULL] perf/core improvements and fixes Ingo Molnar -- strict thread matches above, loose matches on Subject: below -- 2020-05-06 15:21 Arnaldo Carvalho de Melo 2020-04-20 11:52 Arnaldo Carvalho de Melo 2020-04-22 12:09 ` Ingo Molnar 2020-04-23 21:28 ` Daniel Díaz 2020-04-24 13:07 ` Arnaldo Carvalho de Melo 2020-04-24 14:10 ` Andreas Gerstmayr 2020-05-04 19:07 ` Daniel Díaz 2020-05-05 16:37 ` Arnaldo Carvalho de Melo 2020-05-05 16:57 ` Daniel Díaz 2020-05-05 17:03 ` Arnaldo Carvalho de Melo 2020-03-25 12:41 Arnaldo Carvalho de Melo 2020-03-17 21:32 Arnaldo Carvalho de Melo 2020-03-19 14:03 ` Ingo Molnar 2020-03-19 14:07 ` Arnaldo Carvalho de Melo 2020-03-10 11:15 Arnaldo Carvalho de Melo 2020-01-16 13:48 Arnaldo Carvalho de Melo 2020-01-20 8:23 ` Ingo Molnar 2020-01-06 16:06 Arnaldo Carvalho de Melo 2020-01-10 17:50 ` Ingo Molnar 2020-01-28 19:10 ` pr-tracker-bot 2019-12-03 13:55 Arnaldo Carvalho de Melo 2019-12-04 7:51 ` Ingo Molnar 2019-11-28 13:40 Arnaldo Carvalho de Melo 2019-11-29 5:58 ` Ingo Molnar 2019-11-22 14:56 Arnaldo Carvalho de Melo 2019-11-23 8:07 ` Ingo Molnar 2019-11-19 11:32 Arnaldo Carvalho de Melo 2019-11-19 12:00 ` Ingo Molnar 2019-11-12 18:37 Arnaldo Carvalho de Melo 2019-11-15 7:35 ` Ingo Molnar 2019-10-21 13:37 Arnaldo Carvalho de Melo 2019-10-21 23:16 ` Ingo Molnar 2019-10-11 20:04 Arnaldo Carvalho de Melo 2019-10-15 5:25 ` Ingo Molnar 2019-09-26 0:31 Arnaldo Carvalho de Melo 2019-09-26 5:55 ` Ingo Molnar 2019-09-20 14:25 Arnaldo Carvalho de Melo 2019-09-20 16:15 ` Ingo Molnar 2019-09-01 12:22 Arnaldo Carvalho de Melo 2019-09-02 7:14 ` Ingo Molnar 2019-08-29 14:38 Arnaldo Carvalho de Melo 2019-08-29 18:58 ` Ingo Molnar 2019-08-27 1:36 Arnaldo Carvalho de Melo 2019-08-27 8:24 ` Ingo Molnar 2019-08-22 21:00 Arnaldo Carvalho de Melo 2019-08-23 10:30 ` Ingo Molnar 2019-08-20 19:27 Arnaldo Carvalho de Melo 2019-08-20 19:39 ` Ingo Molnar 2019-08-20 19:44 ` Arnaldo Carvalho de Melo 2019-08-16 20:16 Arnaldo Carvalho de Melo 2019-08-14 18:40 Arnaldo Carvalho de Melo 2019-07-22 17:38 Arnaldo Carvalho de Melo 2019-07-15 21:11 Arnaldo Carvalho de Melo 2019-07-09 18:31 Arnaldo Carvalho de Melo 2019-07-13 9:13 ` Ingo Molnar 2019-07-03 3:27 Arnaldo Carvalho de Melo 2019-07-03 13:56 ` Ingo Molnar 2019-07-02 2:25 Arnaldo Carvalho de Melo 2019-07-03 13:55 ` Ingo Molnar 2019-06-21 17:38 Arnaldo Carvalho de Melo 2019-06-22 6:28 ` Ingo Molnar 2019-06-11 18:57 Arnaldo Carvalho de Melo 2019-06-17 18:48 ` Ingo Molnar 2019-05-17 19:34 Arnaldo Carvalho de Melo 2019-05-18 8:27 ` Ingo Molnar 2019-02-25 21:19 Arnaldo Carvalho de Melo 2019-02-28 7:31 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).