linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL 00/41] perf/core improvements and fixes
@ 2018-02-16 19:17 Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 20/41] perf cs-etm: Freeing allocated memory Arnaldo Carvalho de Melo
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ingo,

	Please consider pulling, this is on top of tip/perf/urgent.

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:

  kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216

for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:

  perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

- Fix wrong jump arrow in systems with branch records with cycles,
  i.e. Intel's >= Skylake (Jin Yao)

- Fix 'perf record --per-thread' problem introduced when
  implementing 'perf stat --per-thread (Jin Yao)

- Use arch__compare_symbol_names() to fix 'perf test vmlinux',
  that was using strcmp(symbol names) while the dso routines
  doing symbol lookups used the arch overridable one, making
  this test fail in architectures that overrided that function
  with something other than strcmp() (Jiri Olsa)

- Add 'perf script --show-round-event' to display
  PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)

- Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)

- Use ordered_events for 'perf report --tasks', otherwise we may get
  artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
  (when they got recorded in different CPUs) (Jiri Olsa)

- Add support to display group output for non group events, i.e.
  now when one uses 'perf report --group' on a perf.data file
  recorded without explicitly grouping events with {} (e.g.
  "perf record -e '{cycles,instructions}'" get the same output
  that would produce, i.e. see all those non-grouped events in
  multiple columns, at the same time (Jiri Olsa)

- Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)

- Kernel maps fixes wrt perf.data(report) versus live system (top)
  (Jiri Olsa)

- Fix memory corruption when using 'perf record -j call -g -a <application>'
  followed by 'perf report --branch-history' (Jiri Olsa)

- ARM CoreSight fixes (Mathieu Poirier)

- Add inject capability for CoreSight Traces (Robert Waker)

- Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)

- Man pages fixes (Sangwon Hong, Jaecheol Shin)

- Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
  changed with a glibc update) (Thomas Richter)

- Add detailed CPUID info in the 'perf.data' headers for s/390 to
  then use it in 'perf annotate' (Thomas Richter)

- Add '--interval-count N' to 'perf stat', to use with -I, i.e.
  'perf stat -I 1000 --interval-count 2' will show stats every
   1000ms, two times (yuzhoujian)

- Add 'perf stat --timeout Nms', that will run for that many
  milliseconds and then stop, printing the counters (yuzhoujian)

- Fix description for 'perf report --mem-modex (Andi Kleen)

- Use a wildcard to remove the vfs_getname probe in the
  'perf test' shell based test cases (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Andi Kleen (1):
      perf report: Fix description for --mem-mode

Arnaldo Carvalho de Melo (1):
      perf tests shell lib: Use a wildcard to remove the vfs_getname probe

Jaecheol Shin (1):
      perf annotate: Add missing arguments in Man page

Jin Yao (2):
      perf tools: Use target->per_thread and target->system_wide flags
      perf report: Fix wrong jump arrow

Jiri Olsa (18):
      perf record: Put new line after target override warning
      perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
      tools lib api fs: Add filename__read_xll function
      tools lib api fs: Add sysfs__read_xll function
      perf tests: Fix dwarf unwind for stripped binaries
      perf tools: Fix comment for sort__* compare functions
      perf report: Ask for ordered events for --tasks option
      perf report: Add support to display group output for non group events
      tools lib symbol: Skip non-address kallsyms line
      perf symbols: Check if we read regular file in dso__load()
      perf machine: Free root_dir in machine__init() error path
      perf machine: Move kernel mmap name into struct machine
      perf machine: Generalize machine__set_kernel_mmap()
      perf machine: Don't search for active kernel start in __machine__create_kernel_maps
      perf machine: Remove machine__load_kallsyms()
      perf tools: Do not create kernel maps in sample__resolve()
      perf tests: Use arch__compare_symbol_names to compare symbols
      perf report: Fix memory corruption in --branch-history mode --branch-history

Mathieu Poirier (3):
      perf cs-etm: Freeing allocated memory
      perf auxtrace arm: Fixing uninitialised variable
      perf cs-etm: Properly deal with cpu maps

Ravi Bangoria (3):
      tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
      perf powerpc: Generate system call table from asm/unistd.h
      perf trace powerpc: Use generated syscall table

Robert Walker (3):
      perf cs-etm: Inject capabilitity for CoreSight traces
      perf inject: Emit instruction records on ETM trace discontinuity
      coresight: Update documentation for perf usage

Sangwon Hong (2):
      perf kmem: Document a missing option & an argument
      perf mem: Document a missing option

Thomas Richter (5):
      perf record: Provide detailed information on s390 CPU
      perf annotate: Scan cpuid for s390 and save machine type
      perf cpuid: Introduce a platform specific cpuid compare function
      perf test: Fix test case 23 for s390 z/VM or KVM guests
      perf test: Fix test case inet_pton to accept inlines.

yuzhoujian (2):
      perf stat: Add support to print counts for fixed times
      perf stat: Add support to print counts after a period of time

 Documentation/trace/coresight.txt                  |  51 +++
 tools/arch/powerpc/include/uapi/asm/unistd.h       | 402 +++++++++++++++++
 tools/lib/api/fs/fs.c                              |  44 +-
 tools/lib/api/fs/fs.h                              |   2 +
 tools/lib/symbol/kallsyms.c                        |   4 +
 tools/perf/Documentation/perf-annotate.txt         |   6 +-
 tools/perf/Documentation/perf-kmem.txt             |   6 +-
 tools/perf/Documentation/perf-mem.txt              |   4 +
 tools/perf/Documentation/perf-report.txt           |   5 +-
 tools/perf/Documentation/perf-script.txt           |   3 +
 tools/perf/Documentation/perf-stat.txt             |  10 +
 tools/perf/Makefile.config                         |   2 +
 tools/perf/arch/arm/util/auxtrace.c                |   2 +-
 tools/perf/arch/arm/util/cs-etm.c                  |  51 ++-
 tools/perf/arch/powerpc/Makefile                   |  25 ++
 .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  37 ++
 tools/perf/arch/s390/annotate/instructions.c       |  27 +-
 tools/perf/arch/s390/util/header.c                 | 148 ++++++-
 tools/perf/builtin-record.c                        |   2 +-
 tools/perf/builtin-report.c                        |   7 +-
 tools/perf/builtin-script.c                        |  17 +
 tools/perf/builtin-stat.c                          |  53 ++-
 tools/perf/check-headers.sh                        |   1 +
 tools/perf/tests/code-reading.c                    |  33 +-
 tools/perf/tests/dwarf-unwind.c                    |  46 +-
 tools/perf/tests/shell/lib/probe_vfs_getname.sh    |   2 +-
 .../perf/tests/shell/trace+probe_libc_inet_pton.sh |   6 +-
 tools/perf/tests/vmlinux-kallsyms.c                |   4 +-
 tools/perf/ui/browsers/annotate.c                  |   9 +-
 tools/perf/util/build-id.c                         |  10 +-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  74 +++-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |   2 +
 tools/perf/util/cs-etm.c                           | 478 ++++++++++++++++++---
 tools/perf/util/event.c                            |  16 +-
 tools/perf/util/evlist.c                           |  21 +-
 tools/perf/util/header.h                           |   1 +
 tools/perf/util/hist.c                             |   4 +-
 tools/perf/util/hist.h                             |   1 -
 tools/perf/util/machine.c                          | 145 +++----
 tools/perf/util/machine.h                          |   6 +-
 tools/perf/util/pmu.c                              |  47 +-
 tools/perf/util/sort.c                             |   7 +-
 tools/perf/util/stat.h                             |   2 +
 tools/perf/util/symbol.c                           |  13 +-
 tools/perf/util/syscalltbl.c                       |   8 +
 tools/perf/util/thread_map.c                       |   4 +-
 tools/perf/util/thread_map.h                       |   2 +-
 47 files changed, 1577 insertions(+), 273 deletions(-)
 create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
 create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

Test results:

The first ones are container (docker) based builds of tools/perf with and
without libelf support.  Where clang is available, it is also used to build
perf with/without libelf.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  On a Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz

  # dm
   1 39.82 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0
   2 57.59 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822
   3 44.30 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0
   4 42.14 alpine:edge                   : Ok   gcc (Alpine 6.4.0) 6.4.0
   5 35.50 amazonlinux:1                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
   6 42.97 amazonlinux:2                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
   7 26.46 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   8 27.28 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   9 24.29 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  10 32.15 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  11 39.36 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  12 34.69 debian:7                      : Ok   gcc (Debian 4.7.2-5) 4.7.2
  13 37.92 debian:8                      : Ok   gcc (Debian 4.9.2-10) 4.9.2
  14 62.13 debian:9                      : Ok   gcc (Debian 6.3.0-18) 6.3.0 20170516
  15 65.51 debian:experimental           : Ok   gcc (Debian 7.2.0-18) 7.2.0
  16 38.73 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  17 68.18 debian:experimental-x-mips    : Ok   mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  18 36.21 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
  19 37.57 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  20 38.22 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
  21 42.49 fedora:21                     : Ok   gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
  22 39.15 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  23 41.46 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  24 41.12 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
  25 34.92 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
  26 78.28 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
  27 84.02 fedora:26                     : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
  28 95.42 fedora:27                     : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
  29 78.89 fedora:rawhide                : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-4)
  30 57.48 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0
  31 41.18 mageia:5                      : Ok   gcc (GCC) 4.9.2
  32 42.27 mageia:6                      : Ok   gcc (Mageia 5.4.0-5.mga6) 5.4.0
  33 39.66 opensuse:42.1                 : Ok   gcc (SUSE Linux) 4.8.5
  34 40.09 opensuse:42.2                 : Ok   gcc (SUSE Linux) 4.8.5
  35 41.01 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5
  36 82.32 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 7.3.0
  37 31.70 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  38 38.39 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  39 30.49 ubuntu:12.04.5                : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
  40 36.44 ubuntu:14.04.4                : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  41 32.13 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
  42 58.58 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609
  43 31.52 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  44 31.06 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  45 31.61 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  46 31.93 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609
  47 33.02 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  48 30.94 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  49 63.24 ubuntu:16.10                  : Ok   gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
  50 63.34 ubuntu:17.04                  : Ok   gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
  51 63.56 ubuntu:17.10                  : Ok   gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
  52 63.45 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.2.0-18ubuntu2) 7.2.0

  # uname -a
  Linux jouet 4.15.0-rc9+ #7 SMP Mon Jan 22 18:16:36 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                       : Ok
   2: Detect openat syscall event                           : Ok
   3: Detect openat syscall event on all cpus               : Ok
   4: Read samples using the mmap interface                 : Ok
   5: Test data source output                               : Ok
   6: Parse event definition strings                        : Ok
   7: Simple expression parser                              : Ok
   8: PERF_RECORD_* events & perf_sample fields             : Ok
   9: Parse perf pmu format                                 : Ok
  10: DSO data read                                         : Ok
  11: DSO data cache                                        : Ok
  12: DSO data reopen                                       : Ok
  13: Roundtrip evsel->name                                 : Ok
  14: Parse sched tracepoints fields                        : Ok
  15: syscalls:sys_enter_openat event fields                : Ok
  16: Setup struct perf_event_attr                          : Ok
  17: Match and link multiple hists                         : Ok
  18: 'import perf' in python                               : Ok
  19: Breakpoint overflow signal handler                    : Ok
  20: Breakpoint overflow sampling                          : Ok
  21: Number of exit events of a simple workload            : Ok
  22: Software clock events period values                   : Ok
  23: Object code reading                                   : Ok
  24: Sample parsing                                        : Ok
  25: Use a dummy software event to keep tracking           : Ok
  26: Parse with no sample_id_all bit set                   : Ok
  27: Filter hist entries                                   : Ok
  28: Lookup mmap thread                                    : Ok
  29: Share thread mg                                       : Ok
  30: Sort output of hist entries                           : Ok
  31: Cumulate child hist entries                           : Ok
  32: Track with sched_switch                               : Ok
  33: Filter fds with revents mask in a fdarray             : Ok
  34: Add fd to a fdarray, making it autogrow               : Ok
  35: kmod_path__parse                                      : Ok
  36: Thread map                                            : Ok
  37: LLVM search and compile                               :
  37.1: Basic BPF llvm compile                              : Ok
  37.2: kbuild searching                                    : Ok
  37.3: Compile source for BPF prologue generation          : Ok
  37.4: Compile source for BPF relocation                   : Ok
  38: Session topology                                      : Ok
  39: BPF filter                                            :
  39.1: Basic BPF filtering                                 : Ok
  39.2: BPF pinning                                         : Ok
  39.3: BPF prologue generation                             : Ok
  39.4: BPF relocation checker                              : Ok
  40: Synthesize thread map                                 : Ok
  41: Remove thread map                                     : Ok
  42: Synthesize cpu map                                    : Ok
  43: Synthesize stat config                                : Ok
  44: Synthesize stat                                       : Ok
  45: Synthesize stat round                                 : Ok
  46: Synthesize attr update                                : Ok
  47: Event times                                           : Ok
  48: Read backward ring buffer                             : Ok
  49: Print cpu map                                         : Ok
  50: Probe SDT events                                      : Ok
  51: is_printable_array                                    : Ok
  52: Print bitmap                                          : Ok
  53: perf hooks                                            : Ok
  54: builtin clang support                                 : Skip (not compiled in)
  55: unit_number__scnprintf                                : Ok
  56: x86 rdpmc                                             : Ok
  57: Convert perf time to TSC                              : Ok
  58: DWARF unwind                                          : Ok
  59: x86 instruction decoder - new instructions            : Ok
  60: Use vfs_getname probe to get syscall args filenames   : Ok
  61: probe libc's inet_pton & backtrace it with ping       : Ok
  62: Check open filename arg using perf trace + vfs_getname: Ok
  63: Add vfs_getname probe to get syscall args filenames   : Ok
  # 
  
  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/perf/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
           make_no_libunwind_O: make NO_LIBUNWIND=1
           make_no_backtrace_O: make NO_BACKTRACE=1
        make_with_babeltrace_O: make LIBBABELTRACE=1
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
                 make_static_O: make LDFLAGS=-static
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
                make_install_O: make install
                    make_doc_O: make doc
                   make_pure_O: make
            make_no_libaudit_O: make NO_LIBAUDIT=1
               make_no_slang_O: make NO_SLANG=1
                make_no_gtk2_O: make NO_GTK2=1
            make_install_bin_O: make install-bin
             make_no_libperl_O: make NO_LIBPERL=1
            make_no_demangle_O: make NO_DEMANGLE=1
                   make_tags_O: make tags
            make_no_auxtrace_O: make NO_AUXTRACE=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
         make_install_prefix_O: make install prefix=/tmp/krava
                  make_debug_O: make DEBUG=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                make_no_newt_O: make NO_NEWT=1
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
  ^[[5~           make_no_libnuma_O: make NO_LIBNUMA=1
              make_no_libelf_O: make NO_LIBELF=1
                   make_help_O: make help
              make_no_libbpf_O: make NO_LIBBPF=1
           make_no_libpython_O: make NO_LIBPYTHON=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
              make_clean_all_O: make clean all
                 make_perf_o_O: make perf.o
             make_util_map_o_O: make util/map.o
  OK
  make: Leaving directory '/home/acme/git/perf/tools/perf'
  $ 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 20/41] perf cs-etm: Freeing allocated memory
  2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags Arnaldo Carvalho de Melo
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mathieu Poirier <mathieu.poirier@linaro.org>

This patch frees all the memory allocated in function
cs_etm__alloc_queue().

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-2-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index b9f0a53dfa65..f2c98774e665 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -174,6 +174,12 @@ static void cs_etm__free_queue(void *priv)
 {
 	struct cs_etm_queue *etmq = priv;
 
+	if (!etmq)
+		return;
+
+	thread__zput(etmq->thread);
+	cs_etm_decoder__free(etmq->decoder);
+	zfree(&etmq->event_buf);
 	free(etmq);
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags
  2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 20/41] perf cs-etm: Freeing allocated memory Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Jin Yao <yao.jin@linux.intel.com>

Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map:
Enumerate all threads from /proc") that it has negative impact on 'perf
record --per-thread'. It has the effect of creating a kernel event for
each thread in the system for 'perf record --per-thread'.

Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag")
can fix this issue by creating a new target->all_threads flag.

This patch is based on Mathieu Poirier's patch but it doesn't use a new
target->all_threads flag. This patch just uses 'target->per_thread &&
target->system_wide' as a condition to check for all threads case.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Fixes: 73c0ca1eee3d ("perf thread_map: Enumerate all threads from /proc")
Link: http://lkml.kernel.org/r/1518467557-18505-3-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
[Fixed checkpatch warning about line over 80 characters]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c     | 21 ++++++++++++++++++++-
 tools/perf/util/thread_map.c |  4 ++--
 tools/perf/util/thread_map.h |  2 +-
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e5fc14e53c05..7b7d535396f7 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1086,11 +1086,30 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
 {
+	bool all_threads = (target->per_thread && target->system_wide);
 	struct cpu_map *cpus;
 	struct thread_map *threads;
 
+	/*
+	 * If specify '-a' and '--per-thread' to perf record, perf record
+	 * will override '--per-thread'. target->per_thread = false and
+	 * target->system_wide = true.
+	 *
+	 * If specify '--per-thread' only to perf record,
+	 * target->per_thread = true and target->system_wide = false.
+	 *
+	 * So target->per_thread && target->system_wide is false.
+	 * For perf record, thread_map__new_str doesn't call
+	 * thread_map__new_all_cpus. That will keep perf record's
+	 * current behavior.
+	 *
+	 * For perf stat, it allows the case that target->per_thread and
+	 * target->system_wide are all true. It means to collect system-wide
+	 * per-thread data. thread_map__new_str will call
+	 * thread_map__new_all_cpus to enumerate all threads.
+	 */
 	threads = thread_map__new_str(target->pid, target->tid, target->uid,
-				      target->per_thread);
+				      all_threads);
 
 	if (!threads)
 		return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f6491c..729dad8f412d 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
 }
 
 struct thread_map *thread_map__new_str(const char *pid, const char *tid,
-				       uid_t uid, bool per_thread)
+				       uid_t uid, bool all_threads)
 {
 	if (pid)
 		return thread_map__new_by_pid_str(pid);
@@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid,
 	if (!tid && uid != UINT_MAX)
 		return thread_map__new_by_uid(uid);
 
-	if (per_thread)
+	if (all_threads)
 		return thread_map__new_all_cpus();
 
 	return thread_map__new_by_tid_str(tid);
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 0a806b99e73c..5ec91cfd1869 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
 void thread_map__put(struct thread_map *map);
 
 struct thread_map *thread_map__new_str(const char *pid,
-		const char *tid, uid_t uid, bool per_thread);
+		const char *tid, uid_t uid, bool all_threads);
 
 struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable
  2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 20/41] perf cs-etm: Freeing allocated memory Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 23/41] perf cs-etm: Properly deal with cpu maps Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mathieu Poirier <mathieu.poirier@linaro.org>

When working natively on arm64 the compiler gets pesky and complains
that variable 'i' is uninitialised, something that breaks the
compilation.  Here no further checks are needed since variable
'found_spe' can only be true if variable 'i' has been initialised as
part of the for loop.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-4-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm/util/auxtrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 2323581b157d..fa639e3e52ac 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -68,7 +68,7 @@ struct auxtrace_record
 	bool found_spe = false;
 	static struct perf_pmu **arm_spe_pmus = NULL;
 	static int nr_spes = 0;
-	int i;
+	int i = 0;
 
 	if (!evlist)
 		return NULL;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 23/41] perf cs-etm: Properly deal with cpu maps
  2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2018-02-16 19:17 ` [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mathieu Poirier <mathieu.poirier@linaro.org>

This patch allows the CoreSight AUX info section to fit topologies where
only a subset of all available CPUs are present, avoiding at the same
time accessing the ETM configuration areas of CPUs that have been
offlined.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518478737-24649-1-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm/util/cs-etm.c | 51 +++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index fbfc055d3f4d..5c655ad4621e 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -298,12 +298,17 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 {
 	int i;
 	int etmv3 = 0, etmv4 = 0;
-	const struct cpu_map *cpus = evlist->cpus;
+	struct cpu_map *event_cpus = evlist->cpus;
+	struct cpu_map *online_cpus = cpu_map__new(NULL);
 
 	/* cpu map is not empty, we have specific CPUs to work with */
-	if (!cpu_map__empty(cpus)) {
-		for (i = 0; i < cpu_map__nr(cpus); i++) {
-			if (cs_etm_is_etmv4(itr, cpus->map[i]))
+	if (!cpu_map__empty(event_cpus)) {
+		for (i = 0; i < cpu__max_cpu(); i++) {
+			if (!cpu_map__has(event_cpus, i) ||
+			    !cpu_map__has(online_cpus, i))
+				continue;
+
+			if (cs_etm_is_etmv4(itr, i))
 				etmv4++;
 			else
 				etmv3++;
@@ -311,6 +316,9 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 	} else {
 		/* get configuration for all CPUs in the system */
 		for (i = 0; i < cpu__max_cpu(); i++) {
+			if (!cpu_map__has(online_cpus, i))
+				continue;
+
 			if (cs_etm_is_etmv4(itr, i))
 				etmv4++;
 			else
@@ -318,6 +326,8 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 		}
 	}
 
+	cpu_map__put(online_cpus);
+
 	return (CS_ETM_HEADER_SIZE +
 	       (etmv4 * CS_ETMV4_PRIV_SIZE) +
 	       (etmv3 * CS_ETMV3_PRIV_SIZE));
@@ -447,7 +457,9 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	int i;
 	u32 offset;
 	u64 nr_cpu, type;
-	const struct cpu_map *cpus = session->evlist->cpus;
+	struct cpu_map *cpu_map;
+	struct cpu_map *event_cpus = session->evlist->cpus;
+	struct cpu_map *online_cpus = cpu_map__new(NULL);
 	struct cs_etm_recording *ptr =
 			container_of(itr, struct cs_etm_recording, itr);
 	struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
@@ -458,8 +470,21 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	if (!session->evlist->nr_mmaps)
 		return -EINVAL;
 
-	/* If the cpu_map is empty all CPUs are involved */
-	nr_cpu = cpu_map__empty(cpus) ? cpu__max_cpu() : cpu_map__nr(cpus);
+	/* If the cpu_map is empty all online CPUs are involved */
+	if (cpu_map__empty(event_cpus)) {
+		cpu_map = online_cpus;
+	} else {
+		/* Make sure all specified CPUs are online */
+		for (i = 0; i < cpu_map__nr(event_cpus); i++) {
+			if (cpu_map__has(event_cpus, i) &&
+			    !cpu_map__has(online_cpus, i))
+				return -EINVAL;
+		}
+
+		cpu_map = event_cpus;
+	}
+
+	nr_cpu = cpu_map__nr(cpu_map);
 	/* Get PMU type as dynamically assigned by the core */
 	type = cs_etm_pmu->type;
 
@@ -472,15 +497,11 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 
 	offset = CS_ETM_SNAPSHOT + 1;
 
-	/* cpu map is not empty, we have specific CPUs to work with */
-	if (!cpu_map__empty(cpus)) {
-		for (i = 0; i < cpu_map__nr(cpus) && offset < priv_size; i++)
-			cs_etm_get_metadata(cpus->map[i], &offset, itr, info);
-	} else {
-		/* get configuration for all CPUs in the system */
-		for (i = 0; i < cpu__max_cpu(); i++)
+	for (i = 0; i < cpu__max_cpu() && offset < priv_size; i++)
+		if (cpu_map__has(cpu_map, i))
 			cs_etm_get_metadata(i, &offset, itr, info);
-	}
+
+	cpu_map__put(online_cpus);
 
 	return 0;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces
  2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2018-02-16 19:17 ` [PATCH 23/41] perf cs-etm: Properly deal with cpu maps Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity Arnaldo Carvalho de Melo
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

Added user space perf functionality to translate CoreSight traces into
instruction events with branch stack.

To invoke the new functionality, use the perf inject tool with
--itrace=il. For example, to translate the ETM trace from perf.data into
last branch records in a new inj.data file:

    $ perf inject --itrace=i100000il128 -i perf.data -o perf.data.new

The 'i' parameter to itrace generates periodic instruction events.  The
period between instruction events can be specified as a number of
instructions suffixed by i (default 100000).

The parameter to 'l' specifies the number of entries in the branch stack
attached to instruction events.

The 'b' parameter to itrace generates events on taken branches.

This patch also fixes the contents of the branch events used in perf
report - previously branch events were generated for each contiguous
range of instructions executed.  These are fixed to generate branch
events between the last address of a range ending in an executed branch
instruction and the start address of the next range.

Based on patches by Sebastian Pop <s.pop@samsung.com> with additional fixes
and support for specifying the instruction period.

Originally-by: Sebastian Pop <s.pop@samsung.com>
Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-2-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  65 +++-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
 tools/perf/util/cs-etm.c                        | 434 +++++++++++++++++++++---
 3 files changed, 436 insertions(+), 64 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 1fb01849f1c7..8ff69dfd725a 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -78,6 +78,8 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder)
 {
 	ocsd_datapath_resp_t dp_ret;
 
+	decoder->prev_return = OCSD_RESP_CONT;
+
 	dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET,
 				      0, 0, NULL, NULL);
 	if (OCSD_DATA_RESP_IS_FATAL(dp_ret))
@@ -253,16 +255,16 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
 	decoder->packet_count = 0;
 	for (i = 0; i < MAX_BUFFER; i++) {
 		decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL;
-		decoder->packet_buffer[i].end_addr   = 0xdeadbeefdeadbeefUL;
-		decoder->packet_buffer[i].exc	     = false;
-		decoder->packet_buffer[i].exc_ret    = false;
-		decoder->packet_buffer[i].cpu	     = INT_MIN;
+		decoder->packet_buffer[i].end_addr = 0xdeadbeefdeadbeefUL;
+		decoder->packet_buffer[i].last_instr_taken_branch = false;
+		decoder->packet_buffer[i].exc = false;
+		decoder->packet_buffer[i].exc_ret = false;
+		decoder->packet_buffer[i].cpu = INT_MIN;
 	}
 }
 
 static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
-			      const ocsd_generic_trace_elem *elem,
 			      const u8 trace_chan_id,
 			      enum cs_etm_sample_type sample_type)
 {
@@ -278,18 +280,16 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 		return OCSD_RESP_FATAL_SYS_ERR;
 
 	et = decoder->tail;
+	et = (et + 1) & (MAX_BUFFER - 1);
+	decoder->tail = et;
+	decoder->packet_count++;
+
 	decoder->packet_buffer[et].sample_type = sample_type;
-	decoder->packet_buffer[et].start_addr = elem->st_addr;
-	decoder->packet_buffer[et].end_addr = elem->en_addr;
 	decoder->packet_buffer[et].exc = false;
 	decoder->packet_buffer[et].exc_ret = false;
 	decoder->packet_buffer[et].cpu = *((int *)inode->priv);
-
-	/* Wrap around if need be */
-	et = (et + 1) & (MAX_BUFFER - 1);
-
-	decoder->tail = et;
-	decoder->packet_count++;
+	decoder->packet_buffer[et].start_addr = 0xdeadbeefdeadbeefUL;
+	decoder->packet_buffer[et].end_addr = 0xdeadbeefdeadbeefUL;
 
 	if (decoder->packet_count == MAX_BUFFER - 1)
 		return OCSD_RESP_WAIT;
@@ -297,6 +297,40 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 	return OCSD_RESP_CONT;
 }
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
+			     const ocsd_generic_trace_elem *elem,
+			     const uint8_t trace_chan_id)
+{
+	int ret = 0;
+	struct cs_etm_packet *packet;
+
+	ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+					    CS_ETM_RANGE);
+	if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+		return ret;
+
+	packet = &decoder->packet_buffer[decoder->tail];
+
+	packet->start_addr = elem->st_addr;
+	packet->end_addr = elem->en_addr;
+	switch (elem->last_i_type) {
+	case OCSD_INSTR_BR:
+	case OCSD_INSTR_BR_INDIRECT:
+		packet->last_instr_taken_branch = elem->last_instr_exec;
+		break;
+	case OCSD_INSTR_ISB:
+	case OCSD_INSTR_DSB_DMB:
+	case OCSD_INSTR_OTHER:
+	default:
+		packet->last_instr_taken_branch = false;
+		break;
+	}
+
+	return ret;
+
+}
+
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 				const void *context,
 				const ocsd_trc_index_t indx __maybe_unused,
@@ -316,9 +350,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 		decoder->trace_on = true;
 		break;
 	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
-		resp = cs_etm_decoder__buffer_packet(decoder, elem,
-						     trace_chan_id,
-						     CS_ETM_RANGE);
+		resp = cs_etm_decoder__buffer_range(decoder, elem,
+						    trace_chan_id);
 		break;
 	case OCSD_GEN_TRC_ELEM_EXCEPTION:
 		decoder->packet_buffer[decoder->tail].exc = true;
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 3d2e6205d186..a4fdd285b145 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -30,6 +30,7 @@ struct cs_etm_packet {
 	enum cs_etm_sample_type sample_type;
 	u64 start_addr;
 	u64 end_addr;
+	u8 last_instr_taken_branch;
 	u8 exc;
 	u8 exc_ret;
 	int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f2c98774e665..6e595d96c04d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -32,6 +32,14 @@
 
 #define MAX_TIMESTAMP (~0ULL)
 
+/*
+ * A64 instructions are always 4 bytes
+ *
+ * Only A64 is supported, so can use this constant for converting between
+ * addresses and instruction counts, calculting offsets etc
+ */
+#define A64_INSTR_SIZE 4
+
 struct cs_etm_auxtrace {
 	struct auxtrace auxtrace;
 	struct auxtrace_queues queues;
@@ -45,11 +53,15 @@ struct cs_etm_auxtrace {
 	u8 snapshot_mode;
 	u8 data_queued;
 	u8 sample_branches;
+	u8 sample_instructions;
 
 	int num_cpu;
 	u32 auxtrace_type;
 	u64 branches_sample_type;
 	u64 branches_id;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
 	u64 **metadata;
 	u64 kernel_start;
 	unsigned int pmu_type;
@@ -68,6 +80,12 @@ struct cs_etm_queue {
 	u64 time;
 	u64 timestamp;
 	u64 offset;
+	u64 period_instructions;
+	struct branch_stack *last_branch;
+	struct branch_stack *last_branch_rb;
+	size_t last_branch_pos;
+	struct cs_etm_packet *prev_packet;
+	struct cs_etm_packet *packet;
 };
 
 static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
@@ -180,6 +198,10 @@ static void cs_etm__free_queue(void *priv)
 	thread__zput(etmq->thread);
 	cs_etm_decoder__free(etmq->decoder);
 	zfree(&etmq->event_buf);
+	zfree(&etmq->last_branch);
+	zfree(&etmq->last_branch_rb);
+	zfree(&etmq->prev_packet);
+	zfree(&etmq->packet);
 	free(etmq);
 }
 
@@ -276,11 +298,35 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 	struct cs_etm_decoder_params d_params;
 	struct cs_etm_trace_params  *t_params;
 	struct cs_etm_queue *etmq;
+	size_t szp = sizeof(struct cs_etm_packet);
 
 	etmq = zalloc(sizeof(*etmq));
 	if (!etmq)
 		return NULL;
 
+	etmq->packet = zalloc(szp);
+	if (!etmq->packet)
+		goto out_free;
+
+	if (etm->synth_opts.last_branch || etm->sample_branches) {
+		etmq->prev_packet = zalloc(szp);
+		if (!etmq->prev_packet)
+			goto out_free;
+	}
+
+	if (etm->synth_opts.last_branch) {
+		size_t sz = sizeof(struct branch_stack);
+
+		sz += etm->synth_opts.last_branch_sz *
+		      sizeof(struct branch_entry);
+		etmq->last_branch = zalloc(sz);
+		if (!etmq->last_branch)
+			goto out_free;
+		etmq->last_branch_rb = zalloc(sz);
+		if (!etmq->last_branch_rb)
+			goto out_free;
+	}
+
 	etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
 	if (!etmq->event_buf)
 		goto out_free;
@@ -335,6 +381,7 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 		goto out_free_decoder;
 
 	etmq->offset = 0;
+	etmq->period_instructions = 0;
 
 	return etmq;
 
@@ -342,6 +389,10 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 	cs_etm_decoder__free(etmq->decoder);
 out_free:
 	zfree(&etmq->event_buf);
+	zfree(&etmq->last_branch);
+	zfree(&etmq->last_branch_rb);
+	zfree(&etmq->prev_packet);
+	zfree(&etmq->packet);
 	free(etmq);
 
 	return NULL;
@@ -395,6 +446,129 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
 	return 0;
 }
 
+static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	struct branch_stack *bs_src = etmq->last_branch_rb;
+	struct branch_stack *bs_dst = etmq->last_branch;
+	size_t nr = 0;
+
+	/*
+	 * Set the number of records before early exit: ->nr is used to
+	 * determine how many branches to copy from ->entries.
+	 */
+	bs_dst->nr = bs_src->nr;
+
+	/*
+	 * Early exit when there is nothing to copy.
+	 */
+	if (!bs_src->nr)
+		return;
+
+	/*
+	 * As bs_src->entries is a circular buffer, we need to copy from it in
+	 * two steps.  First, copy the branches from the most recently inserted
+	 * branch ->last_branch_pos until the end of bs_src->entries buffer.
+	 */
+	nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos;
+	memcpy(&bs_dst->entries[0],
+	       &bs_src->entries[etmq->last_branch_pos],
+	       sizeof(struct branch_entry) * nr);
+
+	/*
+	 * If we wrapped around at least once, the branches from the beginning
+	 * of the bs_src->entries buffer and until the ->last_branch_pos element
+	 * are older valid branches: copy them over.  The total number of
+	 * branches copied over will be equal to the number of branches asked by
+	 * the user in last_branch_sz.
+	 */
+	if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) {
+		memcpy(&bs_dst->entries[nr],
+		       &bs_src->entries[0],
+		       sizeof(struct branch_entry) * etmq->last_branch_pos);
+	}
+}
+
+static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	etmq->last_branch_pos = 0;
+	etmq->last_branch_rb->nr = 0;
+}
+
+static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
+{
+	/*
+	 * The packet records the execution range with an exclusive end address
+	 *
+	 * A64 instructions are constant size, so the last executed
+	 * instruction is A64_INSTR_SIZE before the end address
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return packet->end_addr - A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
+{
+	/*
+	 * Only A64 instructions are currently supported, so can get
+	 * instruction count by dividing.
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet,
+				     u64 offset)
+{
+	/*
+	 * Only A64 instructions are currently supported, so can get
+	 * instruction address by muliplying.
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return packet->start_addr + offset * A64_INSTR_SIZE;
+}
+
+static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	struct branch_stack *bs = etmq->last_branch_rb;
+	struct branch_entry *be;
+
+	/*
+	 * The branches are recorded in a circular buffer in reverse
+	 * chronological order: we start recording from the last element of the
+	 * buffer down.  After writing the first element of the stack, move the
+	 * insert position back to the end of the buffer.
+	 */
+	if (!etmq->last_branch_pos)
+		etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
+
+	etmq->last_branch_pos -= 1;
+
+	be       = &bs->entries[etmq->last_branch_pos];
+	be->from = cs_etm__last_executed_instr(etmq->prev_packet);
+	be->to	 = etmq->packet->start_addr;
+	/* No support for mispredict */
+	be->flags.mispred = 0;
+	be->flags.predicted = 1;
+
+	/*
+	 * Increment bs->nr until reaching the number of last branches asked by
+	 * the user on the command line.
+	 */
+	if (bs->nr < etmq->etm->synth_opts.last_branch_sz)
+		bs->nr += 1;
+}
+
+static int cs_etm__inject_event(union perf_event *event,
+			       struct perf_sample *sample, u64 type)
+{
+	event->header.size = perf_event__sample_event_size(sample, type, 0);
+	return perf_event__synthesize_sample(event, type, 0, sample);
+}
+
+
 static int
 cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq)
 {
@@ -459,35 +633,105 @@ static void  cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
 	}
 }
 
+static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
+					    u64 addr, u64 period)
+{
+	int ret = 0;
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	union perf_event *event = etmq->event_buf;
+	struct perf_sample sample = {.ip = 0,};
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	sample.ip = addr;
+	sample.pid = etmq->pid;
+	sample.tid = etmq->tid;
+	sample.id = etmq->etm->instructions_id;
+	sample.stream_id = etmq->etm->instructions_id;
+	sample.period = period;
+	sample.cpu = etmq->packet->cpu;
+	sample.flags = 0;
+	sample.insn_len = 1;
+	sample.cpumode = event->header.misc;
+
+	if (etm->synth_opts.last_branch) {
+		cs_etm__copy_last_branch_rb(etmq);
+		sample.branch_stack = etmq->last_branch;
+	}
+
+	if (etm->synth_opts.inject) {
+		ret = cs_etm__inject_event(event, &sample,
+					   etm->instructions_sample_type);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
+
+	if (ret)
+		pr_err(
+			"CS ETM Trace: failed to deliver instruction event, error %d\n",
+			ret);
+
+	if (etm->synth_opts.last_branch)
+		cs_etm__reset_last_branch_rb(etmq);
+
+	return ret;
+}
+
 /*
  * The cs etm packet encodes an instruction range between a branch target
  * and the next taken branch. Generate sample accordingly.
  */
-static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
-				       struct cs_etm_packet *packet)
+static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
 {
 	int ret = 0;
 	struct cs_etm_auxtrace *etm = etmq->etm;
 	struct perf_sample sample = {.ip = 0,};
 	union perf_event *event = etmq->event_buf;
-	u64 start_addr = packet->start_addr;
-	u64 end_addr = packet->end_addr;
+	struct dummy_branch_stack {
+		u64			nr;
+		struct branch_entry	entries;
+	} dummy_bs;
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = PERF_RECORD_MISC_USER;
 	event->sample.header.size = sizeof(struct perf_event_header);
 
-	sample.ip = start_addr;
+	sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
 	sample.pid = etmq->pid;
 	sample.tid = etmq->tid;
-	sample.addr = end_addr;
+	sample.addr = etmq->packet->start_addr;
 	sample.id = etmq->etm->branches_id;
 	sample.stream_id = etmq->etm->branches_id;
 	sample.period = 1;
-	sample.cpu = packet->cpu;
+	sample.cpu = etmq->packet->cpu;
 	sample.flags = 0;
 	sample.cpumode = PERF_RECORD_MISC_USER;
 
+	/*
+	 * perf report cannot handle events without a branch stack
+	 */
+	if (etm->synth_opts.last_branch) {
+		dummy_bs = (struct dummy_branch_stack){
+			.nr = 1,
+			.entries = {
+				.from = sample.ip,
+				.to = sample.addr,
+			},
+		};
+		sample.branch_stack = (struct branch_stack *)&dummy_bs;
+	}
+
+	if (etm->synth_opts.inject) {
+		ret = cs_etm__inject_event(event, &sample,
+					   etm->branches_sample_type);
+		if (ret)
+			return ret;
+	}
+
 	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
 
 	if (ret)
@@ -584,6 +828,24 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 		etm->sample_branches = true;
 		etm->branches_sample_type = attr.sample_type;
 		etm->branches_id = id;
+		id += 1;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
+	}
+
+	if (etm->synth_opts.last_branch)
+		attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+
+	if (etm->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = etm->synth_opts.period;
+		etm->instructions_sample_period = attr.sample_period;
+		err = cs_etm__synth_event(session, &attr, id);
+		if (err)
+			return err;
+		etm->sample_instructions = true;
+		etm->instructions_sample_type = attr.sample_type;
+		etm->instructions_id = id;
+		id += 1;
 	}
 
 	return 0;
@@ -591,20 +853,68 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 
 static int cs_etm__sample(struct cs_etm_queue *etmq)
 {
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	struct cs_etm_packet *tmp;
 	int ret;
-	struct cs_etm_packet packet;
+	u64 instrs_executed;
 
-	while (1) {
-		ret = cs_etm_decoder__get_packet(etmq->decoder, &packet);
-		if (ret <= 0)
+	instrs_executed = cs_etm__instr_count(etmq->packet);
+	etmq->period_instructions += instrs_executed;
+
+	/*
+	 * Record a branch when the last instruction in
+	 * PREV_PACKET is a branch.
+	 */
+	if (etm->synth_opts.last_branch &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->last_instr_taken_branch)
+		cs_etm__update_last_branch_rb(etmq);
+
+	if (etm->sample_instructions &&
+	    etmq->period_instructions >= etm->instructions_sample_period) {
+		/*
+		 * Emit instruction sample periodically
+		 * TODO: allow period to be defined in cycles and clock time
+		 */
+
+		/* Get number of instructions executed after the sample point */
+		u64 instrs_over = etmq->period_instructions -
+			etm->instructions_sample_period;
+
+		/*
+		 * Calculate the address of the sampled instruction (-1 as
+		 * sample is reported as though instruction has just been
+		 * executed, but PC has not advanced to next instruction)
+		 */
+		u64 offset = (instrs_executed - instrs_over - 1);
+		u64 addr = cs_etm__instr_addr(etmq->packet, offset);
+
+		ret = cs_etm__synth_instruction_sample(
+			etmq, addr, etm->instructions_sample_period);
+		if (ret)
+			return ret;
+
+		/* Carry remaining instructions into next sample period */
+		etmq->period_instructions = instrs_over;
+	}
+
+	if (etm->sample_branches &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+	    etmq->prev_packet->last_instr_taken_branch) {
+		ret = cs_etm__synth_branch_sample(etmq);
+		if (ret)
 			return ret;
+	}
 
+	if (etm->sample_branches || etm->synth_opts.last_branch) {
 		/*
-		 * If the packet contains an instruction range, generate an
-		 * instruction sequence event.
+		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+		 * the next incoming packet.
 		 */
-		if (packet.sample_type & CS_ETM_RANGE)
-			cs_etm__synth_branch_sample(etmq, &packet);
+		tmp = etmq->packet;
+		etmq->packet = etmq->prev_packet;
+		etmq->prev_packet = tmp;
 	}
 
 	return 0;
@@ -621,45 +931,73 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 		etm->kernel_start = machine__kernel_start(etm->machine);
 
 	/* Go through each buffer in the queue and decode them one by one */
-more:
-	buffer_used = 0;
-	memset(&buffer, 0, sizeof(buffer));
-	err = cs_etm__get_trace(&buffer, etmq);
-	if (err <= 0)
-		return err;
-	/*
-	 * We cannot assume consecutive blocks in the data file are contiguous,
-	 * reset the decoder to force re-sync.
-	 */
-	err = cs_etm_decoder__reset(etmq->decoder);
-	if (err != 0)
-		return err;
-
-	/* Run trace decoder until buffer consumed or end of trace */
-	do {
-		processed = 0;
-
-		err = cs_etm_decoder__process_data_block(
-						etmq->decoder,
-						etmq->offset,
-						&buffer.buf[buffer_used],
-						buffer.len - buffer_used,
-						&processed);
-
-		if (err)
+	while (1) {
+		buffer_used = 0;
+		memset(&buffer, 0, sizeof(buffer));
+		err = cs_etm__get_trace(&buffer, etmq);
+		if (err <= 0)
+			return err;
+		/*
+		 * We cannot assume consecutive blocks in the data file are
+		 * contiguous, reset the decoder to force re-sync.
+		 */
+		err = cs_etm_decoder__reset(etmq->decoder);
+		if (err != 0)
 			return err;
 
-		etmq->offset += processed;
-		buffer_used += processed;
+		/* Run trace decoder until buffer consumed or end of trace */
+		do {
+			processed = 0;
+			err = cs_etm_decoder__process_data_block(
+				etmq->decoder,
+				etmq->offset,
+				&buffer.buf[buffer_used],
+				buffer.len - buffer_used,
+				&processed);
+			if (err)
+				return err;
+
+			etmq->offset += processed;
+			buffer_used += processed;
+
+			/* Process each packet in this chunk */
+			while (1) {
+				err = cs_etm_decoder__get_packet(etmq->decoder,
+								 etmq->packet);
+				if (err <= 0)
+					/*
+					 * Stop processing this chunk on
+					 * end of data or error
+					 */
+					break;
+
+				/*
+				 * If the packet contains an instruction
+				 * range, generate instruction sequence
+				 * events.
+				 */
+				if (etmq->packet->sample_type & CS_ETM_RANGE)
+					err = cs_etm__sample(etmq);
+			}
+		} while (buffer.len > buffer_used);
 
 		/*
-		 * Nothing to do with an error condition, let's hope the next
-		 * chunk will be better.
+		 * Generate a last branch event for the branches left in
+		 * the circular buffer at the end of the trace.
 		 */
-		err = cs_etm__sample(etmq);
-	} while (buffer.len > buffer_used);
+		if (etm->sample_instructions &&
+		    etmq->etm->synth_opts.last_branch) {
+			struct branch_stack *bs = etmq->last_branch_rb;
+			struct branch_entry *be =
+				&bs->entries[etmq->last_branch_pos];
+
+			err = cs_etm__synth_instruction_sample(
+				etmq, be->to, etmq->period_instructions);
+			if (err)
+				return err;
+		}
 
-goto more;
+	}
 
 	return err;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity
  2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2018-02-16 19:17 ` [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  2018-02-16 19:17 ` [PATCH 29/41] coresight: Update documentation for perf usage Arnaldo Carvalho de Melo
  2018-02-17 10:49 ` [GIT PULL 00/41] perf/core improvements and fixes Ingo Molnar
  7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

There may be discontinuities in the ETM trace stream due to overflows or
ETM configuration for selective trace.  This patch emits an instruction
sample with the pending branch stack when a TRACE ON packet occurs
indicating a discontinuity in the trace data.

A new packet type CS_ETM_TRACE_ON is added, which is emitted by the low
level decoder when a TRACE ON occurs.  The higher level decoder flushes
the branch stack when this packet is emitted.

Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-3-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  9 +++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  1 +
 tools/perf/util/cs-etm.c                        | 80 ++++++++++++++++++-------
 3 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 8ff69dfd725a..640af88331b4 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -328,7 +328,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
 	}
 
 	return ret;
+}
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
+				const uint8_t trace_chan_id)
+{
+	return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+					     CS_ETM_TRACE_ON);
 }
 
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
@@ -347,6 +354,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 		decoder->trace_on = false;
 		break;
 	case OCSD_GEN_TRC_ELEM_TRACE_ON:
+		resp = cs_etm_decoder__buffer_trace_on(decoder,
+						       trace_chan_id);
 		decoder->trace_on = true;
 		break;
 	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index a4fdd285b145..743f5f444304 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -24,6 +24,7 @@ struct cs_etm_buffer {
 
 enum cs_etm_sample_type {
 	CS_ETM_RANGE = 1 << 0,
+	CS_ETM_TRACE_ON = 1 << 1,
 };
 
 struct cs_etm_packet {
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 6e595d96c04d..1b0d422373be 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -867,6 +867,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 	 */
 	if (etm->synth_opts.last_branch &&
 	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
 	    etmq->prev_packet->last_instr_taken_branch)
 		cs_etm__update_last_branch_rb(etmq);
 
@@ -920,6 +921,40 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 	return 0;
 }
 
+static int cs_etm__flush(struct cs_etm_queue *etmq)
+{
+	int err = 0;
+	struct cs_etm_packet *tmp;
+
+	if (etmq->etm->synth_opts.last_branch &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+		/*
+		 * Generate a last branch event for the branches left in the
+		 * circular buffer at the end of the trace.
+		 *
+		 * Use the address of the end of the last reported execution
+		 * range
+		 */
+		u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
+
+		err = cs_etm__synth_instruction_sample(
+			etmq, addr,
+			etmq->period_instructions);
+		etmq->period_instructions = 0;
+
+		/*
+		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+		 * the next incoming packet.
+		 */
+		tmp = etmq->packet;
+		etmq->packet = etmq->prev_packet;
+		etmq->prev_packet = tmp;
+	}
+
+	return err;
+}
+
 static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 {
 	struct cs_etm_auxtrace *etm = etmq->etm;
@@ -971,32 +1006,31 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 					 */
 					break;
 
-				/*
-				 * If the packet contains an instruction
-				 * range, generate instruction sequence
-				 * events.
-				 */
-				if (etmq->packet->sample_type & CS_ETM_RANGE)
-					err = cs_etm__sample(etmq);
+				switch (etmq->packet->sample_type) {
+				case CS_ETM_RANGE:
+					/*
+					 * If the packet contains an instruction
+					 * range, generate instruction sequence
+					 * events.
+					 */
+					cs_etm__sample(etmq);
+					break;
+				case CS_ETM_TRACE_ON:
+					/*
+					 * Discontinuity in trace, flush
+					 * previous branch stack
+					 */
+					cs_etm__flush(etmq);
+					break;
+				default:
+					break;
+				}
 			}
 		} while (buffer.len > buffer_used);
 
-		/*
-		 * Generate a last branch event for the branches left in
-		 * the circular buffer at the end of the trace.
-		 */
-		if (etm->sample_instructions &&
-		    etmq->etm->synth_opts.last_branch) {
-			struct branch_stack *bs = etmq->last_branch_rb;
-			struct branch_entry *be =
-				&bs->entries[etmq->last_branch_pos];
-
-			err = cs_etm__synth_instruction_sample(
-				etmq, be->to, etmq->period_instructions);
-			if (err)
-				return err;
-		}
-
+		if (err == 0)
+			/* Flush any remaining branch stack entries */
+			err = cs_etm__flush(etmq);
 	}
 
 	return err;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 29/41] coresight: Update documentation for perf usage
  2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2018-02-16 19:17 ` [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  2018-02-17 10:49 ` [GIT PULL 00/41] perf/core improvements and fixes Ingo Molnar
  7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

Add notes on using perf to collect and analyze CoreSight trace

Signed-off-by: Robert Walker <robert.walker@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 Documentation/trace/coresight.txt | 51 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index a33c88cd5d1d..6f0120c3a4f1 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -330,3 +330,54 @@ Details on how to use the generic STM API can be found here [2].
 
 [1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
 [2]. Documentation/trace/stm.txt
+
+
+Using perf tools
+----------------
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g:
+
+    perf record -e cs_etm/@20070000.etr/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+
+	$ gcc-5 -O3 sort.c -o sort
+	$ taskset -c 2 ./sort
+	Bubble sorting array of 30000 elements
+	5910 ms
+
+	$ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
+	Bubble sorting array of 30000 elements
+	12543 ms
+	[ perf record: Woken up 35 times to write data ]
+	[ perf record: Captured and wrote 69.640 MB perf.data ]
+
+	$ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+	$ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ taskset -c 2 ./sort_autofdo
+	Bubble sorting array of 30000 elements
+	5806 ms
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [GIT PULL 00/41] perf/core improvements and fixes
  2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  2018-02-16 19:17 ` [PATCH 29/41] coresight: Update documentation for perf usage Arnaldo Carvalho de Melo
@ 2018-02-17 10:49 ` Ingo Molnar
  7 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2018-02-17 10:49 UTC (permalink / raw)
  To: linux-arm-kernel


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling, this is on top of tip/perf/urgent.
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:
> 
>   kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216
> 
> for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:
> 
>   perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> - Fix wrong jump arrow in systems with branch records with cycles,
>   i.e. Intel's >= Skylake (Jin Yao)
> 
> - Fix 'perf record --per-thread' problem introduced when
>   implementing 'perf stat --per-thread (Jin Yao)
> 
> - Use arch__compare_symbol_names() to fix 'perf test vmlinux',
>   that was using strcmp(symbol names) while the dso routines
>   doing symbol lookups used the arch overridable one, making
>   this test fail in architectures that overrided that function
>   with something other than strcmp() (Jiri Olsa)
> 
> - Add 'perf script --show-round-event' to display
>   PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)
> 
> - Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)
> 
> - Use ordered_events for 'perf report --tasks', otherwise we may get
>   artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
>   (when they got recorded in different CPUs) (Jiri Olsa)
> 
> - Add support to display group output for non group events, i.e.
>   now when one uses 'perf report --group' on a perf.data file
>   recorded without explicitly grouping events with {} (e.g.
>   "perf record -e '{cycles,instructions}'" get the same output
>   that would produce, i.e. see all those non-grouped events in
>   multiple columns, at the same time (Jiri Olsa)
> 
> - Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)
> 
> - Kernel maps fixes wrt perf.data(report) versus live system (top)
>   (Jiri Olsa)
> 
> - Fix memory corruption when using 'perf record -j call -g -a <application>'
>   followed by 'perf report --branch-history' (Jiri Olsa)
> 
> - ARM CoreSight fixes (Mathieu Poirier)
> 
> - Add inject capability for CoreSight Traces (Robert Waker)
> 
> - Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)
> 
> - Man pages fixes (Sangwon Hong, Jaecheol Shin)
> 
> - Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
>   changed with a glibc update) (Thomas Richter)
> 
> - Add detailed CPUID info in the 'perf.data' headers for s/390 to
>   then use it in 'perf annotate' (Thomas Richter)
> 
> - Add '--interval-count N' to 'perf stat', to use with -I, i.e.
>   'perf stat -I 1000 --interval-count 2' will show stats every
>    1000ms, two times (yuzhoujian)
> 
> - Add 'perf stat --timeout Nms', that will run for that many
>   milliseconds and then stop, printing the counters (yuzhoujian)
> 
> - Fix description for 'perf report --mem-modex (Andi Kleen)
> 
> - Use a wildcard to remove the vfs_getname probe in the
>   'perf test' shell based test cases (Arnaldo Carvalho de Melo)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Andi Kleen (1):
>       perf report: Fix description for --mem-mode
> 
> Arnaldo Carvalho de Melo (1):
>       perf tests shell lib: Use a wildcard to remove the vfs_getname probe
> 
> Jaecheol Shin (1):
>       perf annotate: Add missing arguments in Man page
> 
> Jin Yao (2):
>       perf tools: Use target->per_thread and target->system_wide flags
>       perf report: Fix wrong jump arrow
> 
> Jiri Olsa (18):
>       perf record: Put new line after target override warning
>       perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
>       tools lib api fs: Add filename__read_xll function
>       tools lib api fs: Add sysfs__read_xll function
>       perf tests: Fix dwarf unwind for stripped binaries
>       perf tools: Fix comment for sort__* compare functions
>       perf report: Ask for ordered events for --tasks option
>       perf report: Add support to display group output for non group events
>       tools lib symbol: Skip non-address kallsyms line
>       perf symbols: Check if we read regular file in dso__load()
>       perf machine: Free root_dir in machine__init() error path
>       perf machine: Move kernel mmap name into struct machine
>       perf machine: Generalize machine__set_kernel_mmap()
>       perf machine: Don't search for active kernel start in __machine__create_kernel_maps
>       perf machine: Remove machine__load_kallsyms()
>       perf tools: Do not create kernel maps in sample__resolve()
>       perf tests: Use arch__compare_symbol_names to compare symbols
>       perf report: Fix memory corruption in --branch-history mode --branch-history
> 
> Mathieu Poirier (3):
>       perf cs-etm: Freeing allocated memory
>       perf auxtrace arm: Fixing uninitialised variable
>       perf cs-etm: Properly deal with cpu maps
> 
> Ravi Bangoria (3):
>       tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
>       perf powerpc: Generate system call table from asm/unistd.h
>       perf trace powerpc: Use generated syscall table
> 
> Robert Walker (3):
>       perf cs-etm: Inject capabilitity for CoreSight traces
>       perf inject: Emit instruction records on ETM trace discontinuity
>       coresight: Update documentation for perf usage
> 
> Sangwon Hong (2):
>       perf kmem: Document a missing option & an argument
>       perf mem: Document a missing option
> 
> Thomas Richter (5):
>       perf record: Provide detailed information on s390 CPU
>       perf annotate: Scan cpuid for s390 and save machine type
>       perf cpuid: Introduce a platform specific cpuid compare function
>       perf test: Fix test case 23 for s390 z/VM or KVM guests
>       perf test: Fix test case inet_pton to accept inlines.
> 
> yuzhoujian (2):
>       perf stat: Add support to print counts for fixed times
>       perf stat: Add support to print counts after a period of time
> 
>  Documentation/trace/coresight.txt                  |  51 +++
>  tools/arch/powerpc/include/uapi/asm/unistd.h       | 402 +++++++++++++++++
>  tools/lib/api/fs/fs.c                              |  44 +-
>  tools/lib/api/fs/fs.h                              |   2 +
>  tools/lib/symbol/kallsyms.c                        |   4 +
>  tools/perf/Documentation/perf-annotate.txt         |   6 +-
>  tools/perf/Documentation/perf-kmem.txt             |   6 +-
>  tools/perf/Documentation/perf-mem.txt              |   4 +
>  tools/perf/Documentation/perf-report.txt           |   5 +-
>  tools/perf/Documentation/perf-script.txt           |   3 +
>  tools/perf/Documentation/perf-stat.txt             |  10 +
>  tools/perf/Makefile.config                         |   2 +
>  tools/perf/arch/arm/util/auxtrace.c                |   2 +-
>  tools/perf/arch/arm/util/cs-etm.c                  |  51 ++-
>  tools/perf/arch/powerpc/Makefile                   |  25 ++
>  .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  37 ++
>  tools/perf/arch/s390/annotate/instructions.c       |  27 +-
>  tools/perf/arch/s390/util/header.c                 | 148 ++++++-
>  tools/perf/builtin-record.c                        |   2 +-
>  tools/perf/builtin-report.c                        |   7 +-
>  tools/perf/builtin-script.c                        |  17 +
>  tools/perf/builtin-stat.c                          |  53 ++-
>  tools/perf/check-headers.sh                        |   1 +
>  tools/perf/tests/code-reading.c                    |  33 +-
>  tools/perf/tests/dwarf-unwind.c                    |  46 +-
>  tools/perf/tests/shell/lib/probe_vfs_getname.sh    |   2 +-
>  .../perf/tests/shell/trace+probe_libc_inet_pton.sh |   6 +-
>  tools/perf/tests/vmlinux-kallsyms.c                |   4 +-
>  tools/perf/ui/browsers/annotate.c                  |   9 +-
>  tools/perf/util/build-id.c                         |  10 +-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  74 +++-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |   2 +
>  tools/perf/util/cs-etm.c                           | 478 ++++++++++++++++++---
>  tools/perf/util/event.c                            |  16 +-
>  tools/perf/util/evlist.c                           |  21 +-
>  tools/perf/util/header.h                           |   1 +
>  tools/perf/util/hist.c                             |   4 +-
>  tools/perf/util/hist.h                             |   1 -
>  tools/perf/util/machine.c                          | 145 +++----
>  tools/perf/util/machine.h                          |   6 +-
>  tools/perf/util/pmu.c                              |  47 +-
>  tools/perf/util/sort.c                             |   7 +-
>  tools/perf/util/stat.h                             |   2 +
>  tools/perf/util/symbol.c                           |  13 +-
>  tools/perf/util/syscalltbl.c                       |   8 +
>  tools/perf/util/thread_map.c                       |   4 +-
>  tools/perf/util/thread_map.h                       |   2 +-
>  47 files changed, 1577 insertions(+), 273 deletions(-)
>  create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
>  create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-02-17 10:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 20/41] perf cs-etm: Freeing allocated memory Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 23/41] perf cs-etm: Properly deal with cpu maps Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 29/41] coresight: Update documentation for perf usage Arnaldo Carvalho de Melo
2018-02-17 10:49 ` [GIT PULL 00/41] perf/core improvements and fixes Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).