From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752616AbdBTKFj (ORCPT ); Mon, 20 Feb 2017 05:05:39 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:36232 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752116AbdBTKE6 (ORCPT ); Mon, 20 Feb 2017 05:04:58 -0500 Date: Mon, 20 Feb 2017 11:04:49 +0100 From: Ingo Molnar To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Arnaldo Carvalho de Melo , Thomas Gleixner , Andrew Morton , Alexander Shishkin , Jiri Olsa Subject: [GIT PULL] perf updates for v4.11 Message-ID: <20170220100449.GA21030@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus, Please pull the latest perf-core-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus # HEAD: 0c8967c9df230d2c4dde6649f410b62e01806c22 Merge tag 'perf-core-for-mingo-4.11-20170215' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core On the kernel side the main changes in this cycle were: - Add Intel Kaby Lake CPU support (Srinivas Pandruvada) - AMD uncore driver updates for fam17 (Janakarajan Natarajan) - Intel/PT updates and core events optimizations and cleanups (Alexander Shishkin) - cgroups events fixes (David Carrillo-Cisneros) - kprobes improvements (Masami Hiramatsu) - ... plus misc fixes and updates. On the tooling side the main changes were: - Support clang build in tools/{perf,lib/{bpf,traceevent,api}} with CC=clang, to, for instance, take advantage of better warnings (Arnaldo Carvalho de Melo): - Introduce the 'delta-abs' 'perf diff' compute method, that orders the histogram entries by the absolute value of the percentage delta for a function in two perf.data files, i.e. the functions that changed the most (increase or decrease in samples) comes first (Namhyung Kim) - Add support for parsing Intel uncore vendor event files and add uncore vendor events for the Intel server processors (Haswell, Broadwell, IvyBridge), Xeon Phi (Knights Landing) and Broadwell DE (Andi Kleen) - Introduce 'perf ftrace' a perf front end to the kernel's ftrace function and function_graph tracer, defaulting to the "function_graph" tracer, more work will be done in reviving this effort, forward porting it from its initial patch submission (Namhyung Kim) - Add 'e' and 'c' hotkeys to expand/collapse call chains for a single hist entry in the 'perf report' and 'perf top' TUI (Jiri Olsa) - Account thread wait time (off CPU time) separately: sleep, iowait and preempt, based on the prev_state of the last event, show the breakdown when using "perf sched timehist --state" (Namhyumg Kim) - Add more triggers to switch the output file (perf.data.TIMESTAMP). Now, in addition to switching to a different output file when receiving a SIGUSR2, one can also specify file size and time based triggers: perf record -a --switch-output=signal is equivalent to what we had before: perf record -a --switch-output While we can also ask for the file to be "sliced" by size, taking into account that that will happen only when we get woken up by the kernel, i.e. one has to take into account the --mmap-pages (the size of the perf mmap ring buffer): perf record -a --switch-output=2G will break the perf.data output into multiple files limited to 2GB of samples, right when generating the output. For time based samples, alert() will be used, so to have 1 minute limited perf.data output files: perf record -a --switch-output=1m (Jiri Olsa) - Improve 'perf trace' (Arnaldo Carvalho de Melo) - 'perf kallsyms' toy tool to look for extended symbol information on the running kernel and demonstrate the machine/thread/symbol APIs for use in other tools, such as 'perf probe' (Arnaldo Carvalho de Melo) - ... plus tons of other changes, see the shortlog and Git log for details. Thanks, Ingo ------------------> Alexander Shishkin (5): perf/core: Don't re-schedule CPU flexible events needlessly perf/core: Optimize event rescheduling on active contexts perf/x86/intel/pt: Add format strings for PTWRITE and power event tracing perf/core: Do error out on a kernel filter on an exclude_filter event perf/core: Allow kernel filters on CPU events Andi Kleen (12): perf pmu: Factor out scale conversion code perf jevents: Parse eventcode as number perf jevents: Add support for parsing uncore json files perf pmu: Support per pmu json aliases perf pmu: Support event aliases for non cpu// pmus perf list: Add debug support for outputing alias string perf vendor events intel: Add uncore events for Haswell Server processor perf vendor events intel: Add uncore events for Broadwell Server perf vendor events intel: Add uncore events for IvyBridge Server perf vendor events intel: Add uncore events for Sandy Bridge Server perf vendor events intel: Add uncore events for Xeon Phi (Knights Landing) perf vendor events intel: Add uncore events for Broadwell DE Arnaldo Carvalho de Melo (39): tools lib subcmd: Add missing linux/kernel.h include to subcmd.h perf machine: Add a kallsyms loading constructor perf kallsyms: Introduce tool to look for extended symbol information on the running kernel perf trace: Allow specifying list of syscalls and events in -e/--expr/--event tools: Sync x86's vmx.h with the kernel perf scripting perl: Do not die() when not founding event for a type perf ftrace: Make 'function_graph' be the default tracer perf config: Do not consider an error not to have any perfconfig file perf tools: Propagate perf_config() errors perf tools: Fix include of linux/mman.h tools include: Add a __fallthrough statement tools string: Use __fallthrough in perf_atoll() tools strfilter: Use __fallthrough perf top: Use __fallthrough perf thread_map: Correctly size buffer used with dirent->dt_name perf header: Fix handling of PERF_EVENT_UPDATE__SCALE perf bench numa: Avoid possible truncation when using snprintf() perf tests: Avoid possible truncation with dirent->d_name + snprintf perf intel-pt: Use __fallthrough tools include: Introduce linux/compiler-gcc.h tools lib traceevent plugin function: Initialize 'index' variable perf evsel: Inform how to make a sysctl setting permanent perf symbols: No need to check if sym->name is NULL perf tests record: No need to test an array against NULL perf symbols: dso->name is an array, no need to check it against NULL tools: Suppress request for warning options not existent in clang tools: Set the maximum optimization level according to the compiler being used tools lib subcmd: Make it an error to pass a signed value to OPTION_UINTEGER Revert "perf bench futex: Sanitize numeric parameters" perf bench numa: Make sure dprintf() is not defined perf tests: Synthesize struct instead of using field after variable sized type perf record: Do not put a variable sized type not at the end of a struct perf tools: Do not put a variable sized type not at the end of a struct perf probe: Avoid accessing uninitialized 'map' variable perf evsel: Do not put a variable sized type not at the end of a struct perf intel pt decoder: clang has no -Wno-override-init perf tools: Be consistent on the type of map->symbols[] interator perf pmu: Fix check for unset alias->unit array perf tools: Add missing parse_events_error() prototype Borislav Petkov (1): perf/x86/events: Add an AMD-specific Makefile David Carrillo-Cisneros (4): perf tools: Remove unneccessary feature-dwarf warning perf/core: Make cgroup switch visit only cpuctxs with cgroup events perf/core: Remove perf_cpu_context::unique_pmu tools lib traceevent: Robustify do_generate_dynamic_list_file He Kuang (3): perf probe: Fix wrong register name for arm64 perf tools arm64: Add support for generating bpf prologue perf bpf: Add missing newline in debug messages Ingo Molnar (1): tools headers: Sync {tools/,}arch/powerpc/include/uapi/asm/kvm.h, {tools/,}arch/x86/include/asm/cpufeatures.h and {tools/,}arch/arm/include/uapi/asm/kvm.h Janakarajan Natarajan (3): perf/x86/amd/uncore: Rename 'L2' to 'LLC' perf/x86/amd/uncore: Update the number of uncore counters perf/x86/amd/uncore: Update sysfs attributes for Family17h processors Jiri Olsa (10): perf tools: Add unit_number__scnprintf function perf record: Add struct switch_output perf record: Change switch-output option to take optional argument perf record: Add switch-output size option argument perf record: Add switch-output size warning perf record: Add switch-output time option argument perf hists browser: Put hist_entry folding logic into single function perf hists browser: Add e/c hotkeys to expand/collapse callchain for current entry perf c2c report: Display Total records column in offset view perf c2c report: Coalesce by default only by pid,iaddr Joe Stringer (10): tools lib bpf: Fix map offsets in relocation tools lib bpf: Define prog_type fns with macro tools lib bpf: Add set/is helpers for all prog types tools lib bpf: Add libbpf_get_error() tools lib bpf: Add BPF program pinning APIs tools lib bpf: Add bpf_map__pin() tools lib bpf: Add bpf_object__pin() tools perf util: Make rm_rf(path) argument const tools lib api fs: Add bpf_fs filesystem detector perf test: Add libbpf pinning test Josh Poimboeuf (1): tools build: Add tools tree support for 'make -s' Kan Liang (1): perf/core: Try parent PMU first when initializing a child event Krister Johansen (1): perf callchain: Reference count maps Laura Abbott (1): perf jvmti: Create libdir directory before installing libperf-jvmti.so Markus Elfring (2): perf probe: Delete an unnecessary check in try_to_find_absolute_address() perf probe: Delete an unnecessary assignment in try_to_find_absolute_address() Masami Hiramatsu (3): kprobes, extable: Identify kprobes trampolines as kernel text area kprobes/arm64: Remove a redundant dependency from the Kconfig kprobes/x86: Use hlist_for_each_entry() instead of hlist_for_each_entry_safe() Matija Glavinic Pecotic (1): perf unwind: Fix looking up dwarf unwind stack info Michael Petlan (1): perf script: Fix man page about --dump-raw-trace option Mickaël Salaün (4): tools lib bpf: Add missing header to the library samples/bpf: Add missing header samples/bpf: Ignore already processed ELF sections samples/bpf: Reset global variables Namhyung Kim (10): perf sched timehist: Account thread wait time separately perf sched timehist: Add --state option perf sched timehist: Show total wait times for summary perf util: Save pid-cmdline mapping into tracing header perf util: Add more debug message on failure path perf ftrace: Introduce new 'ftrace' tool perf diff: Add 'delta-abs' compute method perf diff: Add diff.order config option perf diff: Add diff.compute config option perf diff: Change default setting to "delta-abs" Ravi Bangoria (1): perf sdt: Show proper hint when event not yet in place via 'perf probe' Soramichi AKIYAMA (3): tools lib subcmd: Fix missing member name perf tools: Move two variables usied in libperf from perf.c perf evlist: Fix typo in deliver_sample() Soramichi Akiyama (1): perf evlist: Fix typo in perf_evlist__start_workload() Srinivas Pandruvada (1): perf/x86/intel: Add Kaby Lake support Steven Rostedt (VMware) (1): tools lib traceevent: Initialize lenght on OLD_RING_BUFFER_TYPE_TIME_STAMP Taeung Song (7): perf ftrace: Remove needless code setting default tracer perf tools: Create for_each_event macro for tracepoints iteration perf ftrace: Add ftrace.tracer config option perf tools: Only increase index if perf_evsel__new_idx() succeeds perf tools: Add missing check for failure in a zalloc() call perf tools: Use zfree() instead of ad hoc equivalent perf tools: Use zfree() to avoid keeping dangling pointers Uwe Kleine-König (1): perf probe: Add option --symfs Victor Kamensky (1): perf symbols: Take into account symfs setting when reading file build ID Wang YanQing (1): perf scripting perl: Fix compile error with some perl5 versions Yannick Brosseau (1): perf script: Also allow forcing reading of non-root owned files by root Makefile | 6 +- arch/arm64/Kconfig | 2 +- arch/x86/events/Makefile | 13 +- arch/x86/events/amd/Makefile | 7 + arch/x86/events/amd/uncore.c | 204 ++++++++----- arch/x86/events/intel/cstate.c | 3 + arch/x86/events/intel/pt.c | 6 + arch/x86/events/intel/rapl.c | 3 + arch/x86/events/intel/uncore.c | 2 + arch/x86/kernel/kprobes/core.c | 2 +- include/linux/kprobes.h | 30 +- include/linux/perf_event.h | 4 +- kernel/events/core.c | 266 ++++++++++------- kernel/extable.c | 9 +- kernel/kprobes.c | 73 +++-- samples/bpf/bpf_load.c | 7 + samples/bpf/tracex5_kern.c | 1 + tools/arch/arm/include/uapi/asm/kvm.h | 9 + tools/arch/powerpc/include/uapi/asm/kvm.h | 5 + tools/arch/x86/include/asm/cpufeatures.h | 11 + tools/arch/x86/include/uapi/asm/vmx.h | 5 + tools/build/Makefile.build | 10 + tools/include/linux/compiler-gcc.h | 14 + tools/include/linux/compiler.h | 9 + tools/lib/api/Makefile | 8 +- tools/lib/api/fs/fs.c | 16 + tools/lib/api/fs/fs.h | 1 + tools/lib/api/fs/tracing_path.c | 32 +- tools/lib/bpf/bpf.h | 1 + tools/lib/bpf/libbpf.c | 264 +++++++++++++++-- tools/lib/bpf/libbpf.h | 19 +- tools/lib/subcmd/Makefile | 8 +- tools/lib/subcmd/parse-options.c | 4 + tools/lib/subcmd/parse-options.h | 19 +- tools/lib/traceevent/Makefile | 14 +- tools/lib/traceevent/kbuffer-parse.c | 1 + tools/lib/traceevent/plugin_function.c | 2 +- tools/perf/Build | 5 +- tools/perf/Documentation/perf-c2c.txt | 2 +- tools/perf/Documentation/perf-config.txt | 12 + tools/perf/Documentation/perf-diff.txt | 15 +- tools/perf/Documentation/perf-ftrace.txt | 36 +++ tools/perf/Documentation/perf-kallsyms.txt | 24 ++ tools/perf/Documentation/perf-record.txt | 14 +- tools/perf/Documentation/perf-sched.txt | 2 + tools/perf/Documentation/perf-script.txt | 4 +- tools/perf/Documentation/perf-trace.txt | 8 +- tools/perf/MANIFEST | 1 + tools/perf/Makefile.config | 10 +- tools/perf/Makefile.perf | 1 + tools/perf/arch/arm64/Makefile | 1 + tools/perf/arch/arm64/include/dwarf-regs-table.h | 12 +- tools/perf/arch/arm64/util/dwarf-regs.c | 15 +- tools/perf/bench/futex-hash.c | 4 - tools/perf/bench/futex-lock-pi.c | 3 - tools/perf/bench/futex-requeue.c | 2 - tools/perf/bench/futex-wake-parallel.c | 4 - tools/perf/bench/futex-wake.c | 3 - tools/perf/bench/futex.h | 4 - tools/perf/bench/numa.c | 7 +- tools/perf/builtin-c2c.c | 3 +- tools/perf/builtin-diff.c | 78 ++++- tools/perf/builtin-ftrace.c | 265 +++++++++++++++++ tools/perf/builtin-help.c | 8 +- tools/perf/builtin-kallsyms.c | 67 +++++ tools/perf/builtin-kmem.c | 12 +- tools/perf/builtin-list.c | 3 + tools/perf/builtin-probe.c | 2 + tools/perf/builtin-record.c | 177 +++++++++-- tools/perf/builtin-report.c | 4 +- tools/perf/builtin-sched.c | 132 ++++++++- tools/perf/builtin-script.c | 3 +- tools/perf/builtin-stat.c | 2 +- tools/perf/builtin-top.c | 8 +- tools/perf/builtin-trace.c | 120 ++++++-- tools/perf/builtin.h | 2 + tools/perf/command-list.txt | 2 + tools/perf/perf.c | 20 +- .../arch/x86/broadwellde/uncore-cache.json | 317 ++++++++++++++++++++ .../arch/x86/broadwellde/uncore-memory.json | 83 ++++++ .../arch/x86/broadwellde/uncore-power.json | 84 ++++++ .../arch/x86/broadwellx/uncore-cache.json | 317 ++++++++++++++++++++ .../arch/x86/broadwellx/uncore-interconnect.json | 28 ++ .../arch/x86/broadwellx/uncore-memory.json | 83 ++++++ .../arch/x86/broadwellx/uncore-power.json | 84 ++++++ .../pmu-events/arch/x86/haswellx/uncore-cache.json | 317 ++++++++++++++++++++ .../arch/x86/haswellx/uncore-interconnect.json | 28 ++ .../arch/x86/haswellx/uncore-memory.json | 83 ++++++ .../pmu-events/arch/x86/haswellx/uncore-power.json | 84 ++++++ .../pmu-events/arch/x86/ivytown/uncore-cache.json | 322 +++++++++++++++++++++ .../arch/x86/ivytown/uncore-interconnect.json | 46 +++ .../pmu-events/arch/x86/ivytown/uncore-memory.json | 75 +++++ .../pmu-events/arch/x86/ivytown/uncore-power.json | 249 ++++++++++++++++ .../pmu-events/arch/x86/jaketown/uncore-cache.json | 209 +++++++++++++ .../arch/x86/jaketown/uncore-interconnect.json | 46 +++ .../arch/x86/jaketown/uncore-memory.json | 79 +++++ .../pmu-events/arch/x86/jaketown/uncore-power.json | 248 ++++++++++++++++ .../arch/x86/knightslanding/uncore-memory.json | 42 +++ tools/perf/pmu-events/jevents.c | 84 +++++- tools/perf/pmu-events/jevents.h | 4 +- tools/perf/pmu-events/pmu-events.h | 3 + tools/perf/tests/Build | 1 + tools/perf/tests/bpf.c | 42 ++- tools/perf/tests/builtin-test.c | 4 + tools/perf/tests/llvm.c | 2 +- tools/perf/tests/parse-events.c | 8 +- tools/perf/tests/parse-no-sample-id-all.c | 19 +- tools/perf/tests/perf-record.c | 2 +- tools/perf/tests/tests.h | 1 + tools/perf/tests/unit_number__scnprintf.c | 37 +++ tools/perf/ui/browsers/hists.c | 60 ++-- tools/perf/ui/setup.c | 1 + tools/perf/util/Build | 1 + tools/perf/util/bpf-loader.c | 4 +- tools/perf/util/callchain.c | 16 +- tools/perf/util/config.c | 23 +- tools/perf/util/data-convert-bt.c | 7 +- tools/perf/util/dso.c | 48 ++- tools/perf/util/event.c | 2 +- tools/perf/util/evlist.c | 12 +- tools/perf/util/evlist.h | 2 + tools/perf/util/evsel.c | 66 ++--- tools/perf/util/evsel_fprintf.c | 1 - tools/perf/util/header.c | 7 +- tools/perf/util/hist.c | 4 +- tools/perf/util/intel-pt-decoder/Build | 6 +- .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 5 + .../util/intel-pt-decoder/intel-pt-pkt-decoder.c | 2 + tools/perf/util/intel-pt.c | 4 +- tools/perf/util/llvm-utils.c | 4 +- tools/perf/util/machine.c | 25 +- tools/perf/util/machine.h | 1 + tools/perf/util/map.c | 4 +- tools/perf/util/parse-events.c | 84 +++--- tools/perf/util/parse-events.y | 37 ++- tools/perf/util/pmu.c | 113 +++++--- tools/perf/util/pmu.h | 1 + tools/perf/util/probe-event.c | 13 +- tools/perf/util/scripting-engines/Build | 2 +- .../perf/util/scripting-engines/trace-event-perl.c | 10 +- tools/perf/util/session.c | 4 +- tools/perf/util/strfilter.c | 1 + tools/perf/util/string.c | 2 + tools/perf/util/symbol.c | 6 +- tools/perf/util/symbol_fprintf.c | 2 +- tools/perf/util/thread_map.c | 2 +- tools/perf/util/trace-event-info.c | 71 +++-- tools/perf/util/trace-event-parse.c | 17 ++ tools/perf/util/trace-event-read.c | 77 ++++- tools/perf/util/trace-event.h | 1 + tools/perf/util/unwind-libunwind-local.c | 54 +++- tools/perf/util/util.c | 15 +- tools/perf/util/util.h | 3 +- tools/scripts/Makefile.include | 17 +- 154 files changed, 5373 insertions(+), 682 deletions(-) create mode 100644 arch/x86/events/amd/Makefile create mode 100644 tools/include/linux/compiler-gcc.h create mode 100644 tools/perf/Documentation/perf-ftrace.txt create mode 100644 tools/perf/Documentation/perf-kallsyms.txt create mode 100644 tools/perf/builtin-ftrace.c create mode 100644 tools/perf/builtin-kallsyms.c create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json create mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json create mode 100644 tools/perf/tests/unit_number__scnprintf.c diff --git a/Makefile b/Makefile index 503dae1de8ef..6dc3e80616e3 100644 --- a/Makefile +++ b/Makefile @@ -87,10 +87,12 @@ endif ifneq ($(filter 4.%,$(MAKE_VERSION)),) # make-4 ifneq ($(filter %s ,$(firstword x$(MAKEFLAGS))),) quiet=silent_ + tools_silent=s endif else # make-3.8x ifneq ($(filter s% -s%,$(MAKEFLAGS)),) quiet=silent_ + tools_silent=-s endif endif @@ -1607,11 +1609,11 @@ endif # Clear a bunch of variables before executing the submake tools/: FORCE $(Q)mkdir -p $(objtree)/tools - $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(filter --j% -j,$(MAKEFLAGS))" O=$(shell cd $(objtree) && /bin/pwd) subdir=tools -C $(src)/tools/ + $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% -j,$(MAKEFLAGS))" O=$(shell cd $(objtree) && /bin/pwd) subdir=tools -C $(src)/tools/ tools/%: FORCE $(Q)mkdir -p $(objtree)/tools - $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(filter --j% -j,$(MAKEFLAGS))" O=$(shell cd $(objtree) && /bin/pwd) subdir=tools -C $(src)/tools/ $* + $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% -j,$(MAKEFLAGS))" O=$(shell cd $(objtree) && /bin/pwd) subdir=tools -C $(src)/tools/ $* # Single targets # --------------------------------------------------------------------------- diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 111742126897..f7dfd6d58659 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -96,7 +96,7 @@ config ARM64 select HAVE_RCU_TABLE_FREE select HAVE_SYSCALL_TRACEPOINTS select HAVE_KPROBES - select HAVE_KRETPROBES if HAVE_KPROBES + select HAVE_KRETPROBES select IOMMU_DMA if IOMMU_SUPPORT select IRQ_DOMAIN select IRQ_FORCED_THREADING diff --git a/arch/x86/events/Makefile b/arch/x86/events/Makefile index 1d392c39fe56..b8ccdb5c9244 100644 --- a/arch/x86/events/Makefile +++ b/arch/x86/events/Makefile @@ -1,11 +1,4 @@ -obj-y += core.o - -obj-$(CONFIG_CPU_SUP_AMD) += amd/core.o amd/uncore.o -obj-$(CONFIG_PERF_EVENTS_AMD_POWER) += amd/power.o -obj-$(CONFIG_X86_LOCAL_APIC) += amd/ibs.o msr.o -ifdef CONFIG_AMD_IOMMU -obj-$(CONFIG_CPU_SUP_AMD) += amd/iommu.o -endif - -obj-$(CONFIG_CPU_SUP_INTEL) += msr.o +obj-y += core.o +obj-y += amd/ +obj-$(CONFIG_X86_LOCAL_APIC) += msr.o obj-$(CONFIG_CPU_SUP_INTEL) += intel/ diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile new file mode 100644 index 000000000000..b1da46f396e0 --- /dev/null +++ b/arch/x86/events/amd/Makefile @@ -0,0 +1,7 @@ +obj-$(CONFIG_CPU_SUP_AMD) += core.o uncore.o +obj-$(CONFIG_PERF_EVENTS_AMD_POWER) += power.o +obj-$(CONFIG_X86_LOCAL_APIC) += ibs.o +ifdef CONFIG_AMD_IOMMU +obj-$(CONFIG_CPU_SUP_AMD) += iommu.o +endif + diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c index a0b1bdb3ad42..4d1f7f2d9aff 100644 --- a/arch/x86/events/amd/uncore.c +++ b/arch/x86/events/amd/uncore.c @@ -22,13 +22,17 @@ #define NUM_COUNTERS_NB 4 #define NUM_COUNTERS_L2 4 -#define MAX_COUNTERS NUM_COUNTERS_NB +#define NUM_COUNTERS_L3 6 +#define MAX_COUNTERS 6 #define RDPMC_BASE_NB 6 -#define RDPMC_BASE_L2 10 +#define RDPMC_BASE_LLC 10 #define COUNTER_SHIFT 16 +static int num_counters_llc; +static int num_counters_nb; + static HLIST_HEAD(uncore_unused_list); struct amd_uncore { @@ -45,30 +49,30 @@ struct amd_uncore { }; static struct amd_uncore * __percpu *amd_uncore_nb; -static struct amd_uncore * __percpu *amd_uncore_l2; +static struct amd_uncore * __percpu *amd_uncore_llc; static struct pmu amd_nb_pmu; -static struct pmu amd_l2_pmu; +static struct pmu amd_llc_pmu; static cpumask_t amd_nb_active_mask; -static cpumask_t amd_l2_active_mask; +static cpumask_t amd_llc_active_mask; static bool is_nb_event(struct perf_event *event) { return event->pmu->type == amd_nb_pmu.type; } -static bool is_l2_event(struct perf_event *event) +static bool is_llc_event(struct perf_event *event) { - return event->pmu->type == amd_l2_pmu.type; + return event->pmu->type == amd_llc_pmu.type; } static struct amd_uncore *event_to_amd_uncore(struct perf_event *event) { if (is_nb_event(event) && amd_uncore_nb) return *per_cpu_ptr(amd_uncore_nb, event->cpu); - else if (is_l2_event(event) && amd_uncore_l2) - return *per_cpu_ptr(amd_uncore_l2, event->cpu); + else if (is_llc_event(event) && amd_uncore_llc) + return *per_cpu_ptr(amd_uncore_llc, event->cpu); return NULL; } @@ -183,16 +187,16 @@ static int amd_uncore_event_init(struct perf_event *event) return -ENOENT; /* - * NB and L2 counters (MSRs) are shared across all cores that share the - * same NB / L2 cache. Interrupts can be directed to a single target - * core, however, event counts generated by processes running on other - * cores cannot be masked out. So we do not support sampling and - * per-thread events. + * NB and Last level cache counters (MSRs) are shared across all cores + * that share the same NB / Last level cache. Interrupts can be directed + * to a single target core, however, event counts generated by processes + * running on other cores cannot be masked out. So we do not support + * sampling and per-thread events. */ if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - /* NB and L2 counters do not have usr/os/guest/host bits */ + /* NB and Last level cache counters do not have usr/os/guest/host bits */ if (event->attr.exclude_user || event->attr.exclude_kernel || event->attr.exclude_host || event->attr.exclude_guest) return -EINVAL; @@ -226,8 +230,8 @@ static ssize_t amd_uncore_attr_show_cpumask(struct device *dev, if (pmu->type == amd_nb_pmu.type) active_mask = &amd_nb_active_mask; - else if (pmu->type == amd_l2_pmu.type) - active_mask = &amd_l2_active_mask; + else if (pmu->type == amd_llc_pmu.type) + active_mask = &amd_llc_active_mask; else return 0; @@ -244,30 +248,47 @@ static struct attribute_group amd_uncore_attr_group = { .attrs = amd_uncore_attrs, }; -PMU_FORMAT_ATTR(event, "config:0-7,32-35"); -PMU_FORMAT_ATTR(umask, "config:8-15"); - -static struct attribute *amd_uncore_format_attr[] = { - &format_attr_event.attr, - &format_attr_umask.attr, - NULL, -}; - -static struct attribute_group amd_uncore_format_group = { - .name = "format", - .attrs = amd_uncore_format_attr, +/* + * Similar to PMU_FORMAT_ATTR but allowing for format_attr to be assigned based + * on family + */ +#define AMD_FORMAT_ATTR(_dev, _name, _format) \ +static ssize_t \ +_dev##_show##_name(struct device *dev, \ + struct device_attribute *attr, \ + char *page) \ +{ \ + BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ + return sprintf(page, _format "\n"); \ +} \ +static struct device_attribute format_attr_##_dev##_name = __ATTR_RO(_dev); + +/* Used for each uncore counter type */ +#define AMD_ATTRIBUTE(_name) \ +static struct attribute *amd_uncore_format_attr_##_name[] = { \ + &format_attr_event_##_name.attr, \ + &format_attr_umask.attr, \ + NULL, \ +}; \ +static struct attribute_group amd_uncore_format_group_##_name = { \ + .name = "format", \ + .attrs = amd_uncore_format_attr_##_name, \ +}; \ +static const struct attribute_group *amd_uncore_attr_groups_##_name[] = { \ + &amd_uncore_attr_group, \ + &amd_uncore_format_group_##_name, \ + NULL, \ }; -static const struct attribute_group *amd_uncore_attr_groups[] = { - &amd_uncore_attr_group, - &amd_uncore_format_group, - NULL, -}; +AMD_FORMAT_ATTR(event, , "config:0-7,32-35"); +AMD_FORMAT_ATTR(umask, , "config:8-15"); +AMD_FORMAT_ATTR(event, _df, "config:0-7,32-35,59-60"); +AMD_FORMAT_ATTR(event, _l3, "config:0-7"); +AMD_ATTRIBUTE(df); +AMD_ATTRIBUTE(l3); static struct pmu amd_nb_pmu = { .task_ctx_nr = perf_invalid_context, - .attr_groups = amd_uncore_attr_groups, - .name = "amd_nb", .event_init = amd_uncore_event_init, .add = amd_uncore_add, .del = amd_uncore_del, @@ -276,10 +297,8 @@ static struct pmu amd_nb_pmu = { .read = amd_uncore_read, }; -static struct pmu amd_l2_pmu = { +static struct pmu amd_llc_pmu = { .task_ctx_nr = perf_invalid_context, - .attr_groups = amd_uncore_attr_groups, - .name = "amd_l2", .event_init = amd_uncore_event_init, .add = amd_uncore_add, .del = amd_uncore_del, @@ -296,14 +315,14 @@ static struct amd_uncore *amd_uncore_alloc(unsigned int cpu) static int amd_uncore_cpu_up_prepare(unsigned int cpu) { - struct amd_uncore *uncore_nb = NULL, *uncore_l2; + struct amd_uncore *uncore_nb = NULL, *uncore_llc; if (amd_uncore_nb) { uncore_nb = amd_uncore_alloc(cpu); if (!uncore_nb) goto fail; uncore_nb->cpu = cpu; - uncore_nb->num_counters = NUM_COUNTERS_NB; + uncore_nb->num_counters = num_counters_nb; uncore_nb->rdpmc_base = RDPMC_BASE_NB; uncore_nb->msr_base = MSR_F15H_NB_PERF_CTL; uncore_nb->active_mask = &amd_nb_active_mask; @@ -312,18 +331,18 @@ static int amd_uncore_cpu_up_prepare(unsigned int cpu) *per_cpu_ptr(amd_uncore_nb, cpu) = uncore_nb; } - if (amd_uncore_l2) { - uncore_l2 = amd_uncore_alloc(cpu); - if (!uncore_l2) + if (amd_uncore_llc) { + uncore_llc = amd_uncore_alloc(cpu); + if (!uncore_llc) goto fail; - uncore_l2->cpu = cpu; - uncore_l2->num_counters = NUM_COUNTERS_L2; - uncore_l2->rdpmc_base = RDPMC_BASE_L2; - uncore_l2->msr_base = MSR_F16H_L2I_PERF_CTL; - uncore_l2->active_mask = &amd_l2_active_mask; - uncore_l2->pmu = &amd_l2_pmu; - uncore_l2->id = -1; - *per_cpu_ptr(amd_uncore_l2, cpu) = uncore_l2; + uncore_llc->cpu = cpu; + uncore_llc->num_counters = num_counters_llc; + uncore_llc->rdpmc_base = RDPMC_BASE_LLC; + uncore_llc->msr_base = MSR_F16H_L2I_PERF_CTL; + uncore_llc->active_mask = &amd_llc_active_mask; + uncore_llc->pmu = &amd_llc_pmu; + uncore_llc->id = -1; + *per_cpu_ptr(amd_uncore_llc, cpu) = uncore_llc; } return 0; @@ -376,17 +395,17 @@ static int amd_uncore_cpu_starting(unsigned int cpu) *per_cpu_ptr(amd_uncore_nb, cpu) = uncore; } - if (amd_uncore_l2) { + if (amd_uncore_llc) { unsigned int apicid = cpu_data(cpu).apicid; unsigned int nshared; - uncore = *per_cpu_ptr(amd_uncore_l2, cpu); + uncore = *per_cpu_ptr(amd_uncore_llc, cpu); cpuid_count(0x8000001d, 2, &eax, &ebx, &ecx, &edx); nshared = ((eax >> 14) & 0xfff) + 1; uncore->id = apicid - (apicid % nshared); - uncore = amd_uncore_find_online_sibling(uncore, amd_uncore_l2); - *per_cpu_ptr(amd_uncore_l2, cpu) = uncore; + uncore = amd_uncore_find_online_sibling(uncore, amd_uncore_llc); + *per_cpu_ptr(amd_uncore_llc, cpu) = uncore; } return 0; @@ -419,8 +438,8 @@ static int amd_uncore_cpu_online(unsigned int cpu) if (amd_uncore_nb) uncore_online(cpu, amd_uncore_nb); - if (amd_uncore_l2) - uncore_online(cpu, amd_uncore_l2); + if (amd_uncore_llc) + uncore_online(cpu, amd_uncore_llc); return 0; } @@ -456,8 +475,8 @@ static int amd_uncore_cpu_down_prepare(unsigned int cpu) if (amd_uncore_nb) uncore_down_prepare(cpu, amd_uncore_nb); - if (amd_uncore_l2) - uncore_down_prepare(cpu, amd_uncore_l2); + if (amd_uncore_llc) + uncore_down_prepare(cpu, amd_uncore_llc); return 0; } @@ -479,8 +498,8 @@ static int amd_uncore_cpu_dead(unsigned int cpu) if (amd_uncore_nb) uncore_dead(cpu, amd_uncore_nb); - if (amd_uncore_l2) - uncore_dead(cpu, amd_uncore_l2); + if (amd_uncore_llc) + uncore_dead(cpu, amd_uncore_llc); return 0; } @@ -492,6 +511,47 @@ static int __init amd_uncore_init(void) if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) goto fail_nodev; + switch(boot_cpu_data.x86) { + case 23: + /* Family 17h: */ + num_counters_nb = NUM_COUNTERS_NB; + num_counters_llc = NUM_COUNTERS_L3; + /* + * For Family17h, the NorthBridge counters are + * re-purposed as Data Fabric counters. Also, support is + * added for L3 counters. The pmus are exported based on + * family as either L2 or L3 and NB or DF. + */ + amd_nb_pmu.name = "amd_df"; + amd_llc_pmu.name = "amd_l3"; + format_attr_event_df.show = &event_show_df; + format_attr_event_l3.show = &event_show_l3; + break; + case 22: + /* Family 16h - may change: */ + num_counters_nb = NUM_COUNTERS_NB; + num_counters_llc = NUM_COUNTERS_L2; + amd_nb_pmu.name = "amd_nb"; + amd_llc_pmu.name = "amd_l2"; + format_attr_event_df = format_attr_event; + format_attr_event_l3 = format_attr_event; + break; + default: + /* + * All prior families have the same number of + * NorthBridge and Last Level Cache counters + */ + num_counters_nb = NUM_COUNTERS_NB; + num_counters_llc = NUM_COUNTERS_L2; + amd_nb_pmu.name = "amd_nb"; + amd_llc_pmu.name = "amd_l2"; + format_attr_event_df = format_attr_event; + format_attr_event_l3 = format_attr_event; + break; + } + amd_nb_pmu.attr_groups = amd_uncore_attr_groups_df; + amd_llc_pmu.attr_groups = amd_uncore_attr_groups_l3; + if (!boot_cpu_has(X86_FEATURE_TOPOEXT)) goto fail_nodev; @@ -510,16 +570,16 @@ static int __init amd_uncore_init(void) } if (boot_cpu_has(X86_FEATURE_PERFCTR_L2)) { - amd_uncore_l2 = alloc_percpu(struct amd_uncore *); - if (!amd_uncore_l2) { + amd_uncore_llc = alloc_percpu(struct amd_uncore *); + if (!amd_uncore_llc) { ret = -ENOMEM; - goto fail_l2; + goto fail_llc; } - ret = perf_pmu_register(&amd_l2_pmu, amd_l2_pmu.name, -1); + ret = perf_pmu_register(&amd_llc_pmu, amd_llc_pmu.name, -1); if (ret) - goto fail_l2; + goto fail_llc; - pr_info("perf: AMD L2I counters detected\n"); + pr_info("perf: AMD LLC counters detected\n"); ret = 0; } @@ -529,7 +589,7 @@ static int __init amd_uncore_init(void) if (cpuhp_setup_state(CPUHP_PERF_X86_AMD_UNCORE_PREP, "perf/x86/amd/uncore:prepare", amd_uncore_cpu_up_prepare, amd_uncore_cpu_dead)) - goto fail_l2; + goto fail_llc; if (cpuhp_setup_state(CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING, "perf/x86/amd/uncore:starting", @@ -546,11 +606,11 @@ static int __init amd_uncore_init(void) cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING); fail_prep: cpuhp_remove_state(CPUHP_PERF_X86_AMD_UNCORE_PREP); -fail_l2: +fail_llc: if (boot_cpu_has(X86_FEATURE_PERFCTR_NB)) perf_pmu_unregister(&amd_nb_pmu); - if (amd_uncore_l2) - free_percpu(amd_uncore_l2); + if (amd_uncore_llc) + free_percpu(amd_uncore_llc); fail_nb: if (amd_uncore_nb) free_percpu(amd_uncore_nb); diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c index 1076c9a77292..aff4b5b69d40 100644 --- a/arch/x86/events/intel/cstate.c +++ b/arch/x86/events/intel/cstate.c @@ -541,6 +541,9 @@ static const struct x86_cpu_id intel_cstates_match[] __initconst = { X86_CSTATES_MODEL(INTEL_FAM6_SKYLAKE_MOBILE, snb_cstates), X86_CSTATES_MODEL(INTEL_FAM6_SKYLAKE_DESKTOP, snb_cstates), + X86_CSTATES_MODEL(INTEL_FAM6_KABYLAKE_MOBILE, snb_cstates), + X86_CSTATES_MODEL(INTEL_FAM6_KABYLAKE_DESKTOP, snb_cstates), + X86_CSTATES_MODEL(INTEL_FAM6_XEON_PHI_KNL, knl_cstates), X86_CSTATES_MODEL(INTEL_FAM6_XEON_PHI_KNM, knl_cstates), { }, diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index 1c1b9fe705c8..5900471ee508 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -99,18 +99,24 @@ static struct attribute_group pt_cap_group = { }; PMU_FORMAT_ATTR(cyc, "config:1" ); +PMU_FORMAT_ATTR(pwr_evt, "config:4" ); +PMU_FORMAT_ATTR(fup_on_ptw, "config:5" ); PMU_FORMAT_ATTR(mtc, "config:9" ); PMU_FORMAT_ATTR(tsc, "config:10" ); PMU_FORMAT_ATTR(noretcomp, "config:11" ); +PMU_FORMAT_ATTR(ptw, "config:12" ); PMU_FORMAT_ATTR(mtc_period, "config:14-17" ); PMU_FORMAT_ATTR(cyc_thresh, "config:19-22" ); PMU_FORMAT_ATTR(psb_period, "config:24-27" ); static struct attribute *pt_formats_attr[] = { &format_attr_cyc.attr, + &format_attr_pwr_evt.attr, + &format_attr_fup_on_ptw.attr, &format_attr_mtc.attr, &format_attr_tsc.attr, &format_attr_noretcomp.attr, + &format_attr_ptw.attr, &format_attr_mtc_period.attr, &format_attr_cyc_thresh.attr, &format_attr_psb_period.attr, diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 22ef4f72cf32..22054ca49026 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -771,6 +771,9 @@ static const struct x86_cpu_id rapl_cpu_match[] __initconst = { X86_RAPL_MODEL_MATCH(INTEL_FAM6_SKYLAKE_DESKTOP, skl_rapl_init), X86_RAPL_MODEL_MATCH(INTEL_FAM6_SKYLAKE_X, hsx_rapl_init), + X86_RAPL_MODEL_MATCH(INTEL_FAM6_KABYLAKE_MOBILE, skl_rapl_init), + X86_RAPL_MODEL_MATCH(INTEL_FAM6_KABYLAKE_DESKTOP, skl_rapl_init), + X86_RAPL_MODEL_MATCH(INTEL_FAM6_ATOM_GOLDMONT, hsw_rapl_init), {}, }; diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index 1ab45976474d..758c1aa5009d 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -1328,6 +1328,8 @@ static const struct x86_cpu_id intel_uncore_match[] __initconst = { X86_UNCORE_MODEL_MATCH(INTEL_FAM6_SKYLAKE_DESKTOP,skl_uncore_init), X86_UNCORE_MODEL_MATCH(INTEL_FAM6_SKYLAKE_MOBILE, skl_uncore_init), X86_UNCORE_MODEL_MATCH(INTEL_FAM6_SKYLAKE_X, skx_uncore_init), + X86_UNCORE_MODEL_MATCH(INTEL_FAM6_KABYLAKE_MOBILE, skl_uncore_init), + X86_UNCORE_MODEL_MATCH(INTEL_FAM6_KABYLAKE_DESKTOP, skl_uncore_init), {}, }; diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index eb3509338ae0..520b8dfe1640 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -745,7 +745,7 @@ __visible __used void *trampoline_handler(struct pt_regs *regs) * will be the real return address, and all the rest will * point to kretprobe_trampoline. */ - hlist_for_each_entry_safe(ri, tmp, head, hlist) { + hlist_for_each_entry(ri, head, hlist) { if (ri->task != current) /* another task is sharing our hash bucket */ continue; diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 8f6849084248..16ddfb8b304a 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -278,9 +278,13 @@ struct kprobe_insn_cache { int nr_garbage; }; +#ifdef __ARCH_WANT_KPROBES_INSN_SLOT extern kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c); extern void __free_insn_slot(struct kprobe_insn_cache *c, kprobe_opcode_t *slot, int dirty); +/* sleep-less address checking routine */ +extern bool __is_insn_slot_addr(struct kprobe_insn_cache *c, + unsigned long addr); #define DEFINE_INSN_CACHE_OPS(__name) \ extern struct kprobe_insn_cache kprobe_##__name##_slots; \ @@ -294,6 +298,18 @@ static inline void free_##__name##_slot(kprobe_opcode_t *slot, int dirty)\ { \ __free_insn_slot(&kprobe_##__name##_slots, slot, dirty); \ } \ + \ +static inline bool is_kprobe_##__name##_slot(unsigned long addr) \ +{ \ + return __is_insn_slot_addr(&kprobe_##__name##_slots, addr); \ +} +#else /* __ARCH_WANT_KPROBES_INSN_SLOT */ +#define DEFINE_INSN_CACHE_OPS(__name) \ +static inline bool is_kprobe_##__name##_slot(unsigned long addr) \ +{ \ + return 0; \ +} +#endif DEFINE_INSN_CACHE_OPS(insn); @@ -330,7 +346,6 @@ extern int proc_kprobes_optimization_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos); #endif - #endif /* CONFIG_OPTPROBES */ #ifdef CONFIG_KPROBES_ON_FTRACE extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, @@ -481,6 +496,19 @@ static inline int enable_jprobe(struct jprobe *jp) return enable_kprobe(&jp->kp); } +#ifndef CONFIG_KPROBES +static inline bool is_kprobe_insn_slot(unsigned long addr) +{ + return false; +} +#endif +#ifndef CONFIG_OPTPROBES +static inline bool is_kprobe_optinsn_slot(unsigned long addr) +{ + return false; +} +#endif + #ifdef CONFIG_KPROBES /* * Blacklist ganerating macro. Specify functions which is not probed diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 78ed8105e64d..000fdb211c7d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -482,6 +482,7 @@ struct perf_addr_filter { * @list: list of filters for this event * @lock: spinlock that serializes accesses to the @list and event's * (and its children's) filter generations. + * @nr_file_filters: number of file-based filters * * A child event will use parent's @list (and therefore @lock), so they are * bundled together; see perf_event_addr_filters(). @@ -489,6 +490,7 @@ struct perf_addr_filter { struct perf_addr_filters_head { struct list_head list; raw_spinlock_t lock; + unsigned int nr_file_filters; }; /** @@ -785,9 +787,9 @@ struct perf_cpu_context { ktime_t hrtimer_interval; unsigned int hrtimer_active; - struct pmu *unique_pmu; #ifdef CONFIG_CGROUP_PERF struct perf_cgroup *cgrp; + struct list_head cgrp_cpuctx_entry; #endif struct list_head sched_cb_entry; diff --git a/kernel/events/core.c b/kernel/events/core.c index e235bb991bdd..77a932b54a64 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -355,6 +355,8 @@ enum event_type_t { EVENT_FLEXIBLE = 0x1, EVENT_PINNED = 0x2, EVENT_TIME = 0x4, + /* see ctx_resched() for details */ + EVENT_CPU = 0x8, EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED, }; @@ -678,6 +680,8 @@ perf_cgroup_set_timestamp(struct task_struct *task, info->timestamp = ctx->timestamp; } +static DEFINE_PER_CPU(struct list_head, cgrp_cpuctx_list); + #define PERF_CGROUP_SWOUT 0x1 /* cgroup switch out every event */ #define PERF_CGROUP_SWIN 0x2 /* cgroup switch in events based on task */ @@ -690,61 +694,46 @@ perf_cgroup_set_timestamp(struct task_struct *task, static void perf_cgroup_switch(struct task_struct *task, int mode) { struct perf_cpu_context *cpuctx; - struct pmu *pmu; + struct list_head *list; unsigned long flags; /* - * disable interrupts to avoid geting nr_cgroup - * changes via __perf_event_disable(). Also - * avoids preemption. + * Disable interrupts and preemption to avoid this CPU's + * cgrp_cpuctx_entry to change under us. */ local_irq_save(flags); - /* - * we reschedule only in the presence of cgroup - * constrained events. - */ + list = this_cpu_ptr(&cgrp_cpuctx_list); + list_for_each_entry(cpuctx, list, cgrp_cpuctx_entry) { + WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0); - list_for_each_entry_rcu(pmu, &pmus, entry) { - cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); - if (cpuctx->unique_pmu != pmu) - continue; /* ensure we process each cpuctx once */ - - /* - * perf_cgroup_events says at least one - * context on this CPU has cgroup events. - * - * ctx->nr_cgroups reports the number of cgroup - * events for a context. - */ - if (cpuctx->ctx.nr_cgroups > 0) { - perf_ctx_lock(cpuctx, cpuctx->task_ctx); - perf_pmu_disable(cpuctx->ctx.pmu); + perf_ctx_lock(cpuctx, cpuctx->task_ctx); + perf_pmu_disable(cpuctx->ctx.pmu); - if (mode & PERF_CGROUP_SWOUT) { - cpu_ctx_sched_out(cpuctx, EVENT_ALL); - /* - * must not be done before ctxswout due - * to event_filter_match() in event_sched_out() - */ - cpuctx->cgrp = NULL; - } + if (mode & PERF_CGROUP_SWOUT) { + cpu_ctx_sched_out(cpuctx, EVENT_ALL); + /* + * must not be done before ctxswout due + * to event_filter_match() in event_sched_out() + */ + cpuctx->cgrp = NULL; + } - if (mode & PERF_CGROUP_SWIN) { - WARN_ON_ONCE(cpuctx->cgrp); - /* - * set cgrp before ctxsw in to allow - * event_filter_match() to not have to pass - * task around - * we pass the cpuctx->ctx to perf_cgroup_from_task() - * because cgorup events are only per-cpu - */ - cpuctx->cgrp = perf_cgroup_from_task(task, &cpuctx->ctx); - cpu_ctx_sched_in(cpuctx, EVENT_ALL, task); - } - perf_pmu_enable(cpuctx->ctx.pmu); - perf_ctx_unlock(cpuctx, cpuctx->task_ctx); + if (mode & PERF_CGROUP_SWIN) { + WARN_ON_ONCE(cpuctx->cgrp); + /* + * set cgrp before ctxsw in to allow + * event_filter_match() to not have to pass + * task around + * we pass the cpuctx->ctx to perf_cgroup_from_task() + * because cgorup events are only per-cpu + */ + cpuctx->cgrp = perf_cgroup_from_task(task, + &cpuctx->ctx); + cpu_ctx_sched_in(cpuctx, EVENT_ALL, task); } + perf_pmu_enable(cpuctx->ctx.pmu); + perf_ctx_unlock(cpuctx, cpuctx->task_ctx); } local_irq_restore(flags); @@ -889,6 +878,7 @@ list_update_cgroup_event(struct perf_event *event, struct perf_event_context *ctx, bool add) { struct perf_cpu_context *cpuctx; + struct list_head *cpuctx_entry; if (!is_cgroup_event(event)) return; @@ -902,15 +892,16 @@ list_update_cgroup_event(struct perf_event *event, * this will always be called from the right CPU. */ cpuctx = __get_cpu_context(ctx); - - /* - * cpuctx->cgrp is NULL until a cgroup event is sched in or - * ctx->nr_cgroup == 0 . - */ - if (add && perf_cgroup_from_task(current, ctx) == event->cgrp) - cpuctx->cgrp = event->cgrp; - else if (!add) + cpuctx_entry = &cpuctx->cgrp_cpuctx_entry; + /* cpuctx->cgrp is NULL unless a cgroup event is active in this CPU .*/ + if (add) { + list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list)); + if (perf_cgroup_from_task(current, ctx) == event->cgrp) + cpuctx->cgrp = event->cgrp; + } else { + list_del(cpuctx_entry); cpuctx->cgrp = NULL; + } } #else /* !CONFIG_CGROUP_PERF */ @@ -1453,6 +1444,20 @@ static void update_group_times(struct perf_event *leader) update_event_times(event); } +static enum event_type_t get_event_type(struct perf_event *event) +{ + struct perf_event_context *ctx = event->ctx; + enum event_type_t event_type; + + lockdep_assert_held(&ctx->lock); + + event_type = event->attr.pinned ? EVENT_PINNED : EVENT_FLEXIBLE; + if (!ctx->task) + event_type |= EVENT_CPU; + + return event_type; +} + static struct list_head * ctx_group_list(struct perf_event *event, struct perf_event_context *ctx) { @@ -2226,7 +2231,8 @@ ctx_sched_in(struct perf_event_context *ctx, struct task_struct *task); static void task_ctx_sched_out(struct perf_cpu_context *cpuctx, - struct perf_event_context *ctx) + struct perf_event_context *ctx, + enum event_type_t event_type) { if (!cpuctx->task_ctx) return; @@ -2234,7 +2240,7 @@ static void task_ctx_sched_out(struct perf_cpu_context *cpuctx, if (WARN_ON_ONCE(ctx != cpuctx->task_ctx)) return; - ctx_sched_out(ctx, cpuctx, EVENT_ALL); + ctx_sched_out(ctx, cpuctx, event_type); } static void perf_event_sched_in(struct perf_cpu_context *cpuctx, @@ -2249,13 +2255,51 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx, ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task); } +/* + * We want to maintain the following priority of scheduling: + * - CPU pinned (EVENT_CPU | EVENT_PINNED) + * - task pinned (EVENT_PINNED) + * - CPU flexible (EVENT_CPU | EVENT_FLEXIBLE) + * - task flexible (EVENT_FLEXIBLE). + * + * In order to avoid unscheduling and scheduling back in everything every + * time an event is added, only do it for the groups of equal priority and + * below. + * + * This can be called after a batch operation on task events, in which case + * event_type is a bit mask of the types of events involved. For CPU events, + * event_type is only either EVENT_PINNED or EVENT_FLEXIBLE. + */ static void ctx_resched(struct perf_cpu_context *cpuctx, - struct perf_event_context *task_ctx) + struct perf_event_context *task_ctx, + enum event_type_t event_type) { + enum event_type_t ctx_event_type = event_type & EVENT_ALL; + bool cpu_event = !!(event_type & EVENT_CPU); + + /* + * If pinned groups are involved, flexible groups also need to be + * scheduled out. + */ + if (event_type & EVENT_PINNED) + event_type |= EVENT_FLEXIBLE; + perf_pmu_disable(cpuctx->ctx.pmu); if (task_ctx) - task_ctx_sched_out(cpuctx, task_ctx); - cpu_ctx_sched_out(cpuctx, EVENT_ALL); + task_ctx_sched_out(cpuctx, task_ctx, event_type); + + /* + * Decide which cpu ctx groups to schedule out based on the types + * of events that caused rescheduling: + * - EVENT_CPU: schedule out corresponding groups; + * - EVENT_PINNED task events: schedule out EVENT_FLEXIBLE groups; + * - otherwise, do nothing more. + */ + if (cpu_event) + cpu_ctx_sched_out(cpuctx, ctx_event_type); + else if (ctx_event_type & EVENT_PINNED) + cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE); + perf_event_sched_in(cpuctx, task_ctx, current); perf_pmu_enable(cpuctx->ctx.pmu); } @@ -2302,7 +2346,7 @@ static int __perf_install_in_context(void *info) if (reprogram) { ctx_sched_out(ctx, cpuctx, EVENT_TIME); add_event_to_ctx(event, ctx); - ctx_resched(cpuctx, task_ctx); + ctx_resched(cpuctx, task_ctx, get_event_type(event)); } else { add_event_to_ctx(event, ctx); } @@ -2469,7 +2513,7 @@ static void __perf_event_enable(struct perf_event *event, if (ctx->task) WARN_ON_ONCE(task_ctx != ctx); - ctx_resched(cpuctx, task_ctx); + ctx_resched(cpuctx, task_ctx, get_event_type(event)); } /* @@ -2896,7 +2940,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn, if (do_switch) { raw_spin_lock(&ctx->lock); - task_ctx_sched_out(cpuctx, ctx); + task_ctx_sched_out(cpuctx, ctx, EVENT_ALL); raw_spin_unlock(&ctx->lock); } } @@ -2943,7 +2987,7 @@ static void perf_pmu_sched_task(struct task_struct *prev, return; list_for_each_entry(cpuctx, this_cpu_ptr(&sched_cb_list), sched_cb_entry) { - pmu = cpuctx->unique_pmu; /* software PMUs will not have sched_task */ + pmu = cpuctx->ctx.pmu; /* software PMUs will not have sched_task */ if (WARN_ON_ONCE(!pmu->sched_task)) continue; @@ -3133,8 +3177,12 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx, * We want to keep the following priority order: * cpu pinned (that don't need to move), task pinned, * cpu flexible, task flexible. + * + * However, if task's ctx is not carrying any pinned + * events, no need to flip the cpuctx's events around. */ - cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE); + if (!list_empty(&ctx->pinned_groups)) + cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE); perf_event_sched_in(cpuctx, ctx, task); perf_pmu_enable(ctx->pmu); perf_ctx_unlock(cpuctx, ctx); @@ -3449,6 +3497,7 @@ static int event_enable_on_exec(struct perf_event *event, static void perf_event_enable_on_exec(int ctxn) { struct perf_event_context *ctx, *clone_ctx = NULL; + enum event_type_t event_type = 0; struct perf_cpu_context *cpuctx; struct perf_event *event; unsigned long flags; @@ -3462,15 +3511,17 @@ static void perf_event_enable_on_exec(int ctxn) cpuctx = __get_cpu_context(ctx); perf_ctx_lock(cpuctx, ctx); ctx_sched_out(ctx, cpuctx, EVENT_TIME); - list_for_each_entry(event, &ctx->event_list, event_entry) + list_for_each_entry(event, &ctx->event_list, event_entry) { enabled |= event_enable_on_exec(event, ctx); + event_type |= get_event_type(event); + } /* * Unclone and reschedule this context if we enabled any event. */ if (enabled) { clone_ctx = unclone_ctx(ctx); - ctx_resched(cpuctx, ctx); + ctx_resched(cpuctx, ctx, event_type); } perf_ctx_unlock(cpuctx, ctx); @@ -8044,6 +8095,9 @@ static void perf_event_addr_filters_apply(struct perf_event *event) if (task == TASK_TOMBSTONE) return; + if (!ifh->nr_file_filters) + return; + mm = get_task_mm(event->ctx->task); if (!mm) goto restart; @@ -8214,6 +8268,7 @@ perf_event_parse_addr_filter(struct perf_event *event, char *fstr, * attribute. */ if (state == IF_STATE_END) { + ret = -EINVAL; if (kernel && event->attr.exclude_kernel) goto fail; @@ -8221,6 +8276,18 @@ perf_event_parse_addr_filter(struct perf_event *event, char *fstr, if (!filename) goto fail; + /* + * For now, we only support file-based filters + * in per-task events; doing so for CPU-wide + * events requires additional context switching + * trickery, since same object code will be + * mapped at different virtual addresses in + * different processes. + */ + ret = -EOPNOTSUPP; + if (!event->ctx->task) + goto fail_free_name; + /* look up the path and grab its inode */ ret = kern_path(filename, LOOKUP_FOLLOW, &path); if (ret) @@ -8236,6 +8303,8 @@ perf_event_parse_addr_filter(struct perf_event *event, char *fstr, !S_ISREG(filter->inode->i_mode)) /* free_filters_list() will iput() */ goto fail; + + event->addr_filters.nr_file_filters++; } /* ready to consume more filters */ @@ -8275,24 +8344,13 @@ perf_event_set_addr_filter(struct perf_event *event, char *filter_str) if (WARN_ON_ONCE(event->parent)) return -EINVAL; - /* - * For now, we only support filtering in per-task events; doing so - * for CPU-wide events requires additional context switching trickery, - * since same object code will be mapped at different virtual - * addresses in different processes. - */ - if (!event->ctx->task) - return -EOPNOTSUPP; - ret = perf_event_parse_addr_filter(event, filter_str, &filters); if (ret) - return ret; + goto fail_clear_files; ret = event->pmu->addr_filters_validate(&filters); - if (ret) { - free_filters_list(&filters); - return ret; - } + if (ret) + goto fail_free_filters; /* remove existing filters, if any */ perf_addr_filters_splice(event, &filters); @@ -8301,6 +8359,14 @@ perf_event_set_addr_filter(struct perf_event *event, char *filter_str) perf_event_for_each_child(event, perf_event_addr_filters_apply); return ret; + +fail_free_filters: + free_filters_list(&filters); + +fail_clear_files: + event->addr_filters.nr_file_filters = 0; + + return ret; } static int perf_event_set_filter(struct perf_event *event, void __user *arg) @@ -8652,37 +8718,10 @@ static struct perf_cpu_context __percpu *find_pmu_context(int ctxn) return NULL; } -static void update_pmu_context(struct pmu *pmu, struct pmu *old_pmu) -{ - int cpu; - - for_each_possible_cpu(cpu) { - struct perf_cpu_context *cpuctx; - - cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu); - - if (cpuctx->unique_pmu == old_pmu) - cpuctx->unique_pmu = pmu; - } -} - static void free_pmu_context(struct pmu *pmu) { - struct pmu *i; - mutex_lock(&pmus_lock); - /* - * Like a real lame refcount. - */ - list_for_each_entry(i, &pmus, entry) { - if (i->pmu_cpu_context == pmu->pmu_cpu_context) { - update_pmu_context(i, pmu); - goto out; - } - } - free_percpu(pmu->pmu_cpu_context); -out: mutex_unlock(&pmus_lock); } @@ -8886,8 +8925,6 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type) cpuctx->ctx.pmu = pmu; __perf_mux_hrtimer_init(cpuctx, cpu); - - cpuctx->unique_pmu = pmu; } got_cpu_context: @@ -9005,6 +9042,14 @@ static struct pmu *perf_init_event(struct perf_event *event) idx = srcu_read_lock(&pmus_srcu); + /* Try parent's PMU first: */ + if (event->parent && event->parent->pmu) { + pmu = event->parent->pmu; + ret = perf_try_init_event(pmu, event); + if (!ret) + goto unlock; + } + rcu_read_lock(); pmu = idr_find(&pmu_idr, event->attr.type); rcu_read_unlock(); @@ -10265,7 +10310,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn) * in. */ raw_spin_lock_irq(&child_ctx->lock); - task_ctx_sched_out(__get_cpu_context(child_ctx), child_ctx); + task_ctx_sched_out(__get_cpu_context(child_ctx), child_ctx, EVENT_ALL); /* * Now that the context is inactive, destroy the task <-> ctx relation @@ -10714,6 +10759,9 @@ static void __init perf_event_init_all_cpus(void) INIT_LIST_HEAD(&per_cpu(pmu_sb_events.list, cpu)); raw_spin_lock_init(&per_cpu(pmu_sb_events.lock, cpu)); +#ifdef CONFIG_CGROUP_PERF + INIT_LIST_HEAD(&per_cpu(cgrp_cpuctx_list, cpu)); +#endif INIT_LIST_HEAD(&per_cpu(sched_cb_list, cpu)); } } diff --git a/kernel/extable.c b/kernel/extable.c index e3beec4a2339..e1359474baa5 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -104,6 +105,8 @@ int __kernel_text_address(unsigned long addr) return 1; if (is_ftrace_trampoline(addr)) return 1; + if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr)) + return 1; /* * There might be init symbols in saved stacktraces. * Give those symbols a chance to be printed in @@ -123,7 +126,11 @@ int kernel_text_address(unsigned long addr) return 1; if (is_module_text_address(addr)) return 1; - return is_ftrace_trampoline(addr); + if (is_ftrace_trampoline(addr)) + return 1; + if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr)) + return 1; + return 0; } /* diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 43460104f119..ebb4dadca66b 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -149,9 +149,11 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) struct kprobe_insn_page *kip; kprobe_opcode_t *slot = NULL; + /* Since the slot array is not protected by rcu, we need a mutex */ mutex_lock(&c->mutex); retry: - list_for_each_entry(kip, &c->pages, list) { + rcu_read_lock(); + list_for_each_entry_rcu(kip, &c->pages, list) { if (kip->nused < slots_per_page(c)) { int i; for (i = 0; i < slots_per_page(c); i++) { @@ -159,6 +161,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) kip->slot_used[i] = SLOT_USED; kip->nused++; slot = kip->insns + (i * c->insn_size); + rcu_read_unlock(); goto out; } } @@ -167,6 +170,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) WARN_ON(1); } } + rcu_read_unlock(); /* If there are any garbage slots, collect it and try again. */ if (c->nr_garbage && collect_garbage_slots(c) == 0) @@ -193,7 +197,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) kip->nused = 1; kip->ngarbage = 0; kip->cache = c; - list_add(&kip->list, &c->pages); + list_add_rcu(&kip->list, &c->pages); slot = kip->insns; out: mutex_unlock(&c->mutex); @@ -213,7 +217,8 @@ static int collect_one_slot(struct kprobe_insn_page *kip, int idx) * next time somebody inserts a probe. */ if (!list_is_singular(&kip->list)) { - list_del(&kip->list); + list_del_rcu(&kip->list); + synchronize_rcu(); kip->cache->free(kip->insns); kfree(kip); } @@ -235,8 +240,7 @@ static int collect_garbage_slots(struct kprobe_insn_cache *c) continue; kip->ngarbage = 0; /* we will collect all garbages */ for (i = 0; i < slots_per_page(c); i++) { - if (kip->slot_used[i] == SLOT_DIRTY && - collect_one_slot(kip, i)) + if (kip->slot_used[i] == SLOT_DIRTY && collect_one_slot(kip, i)) break; } } @@ -248,29 +252,60 @@ void __free_insn_slot(struct kprobe_insn_cache *c, kprobe_opcode_t *slot, int dirty) { struct kprobe_insn_page *kip; + long idx; mutex_lock(&c->mutex); - list_for_each_entry(kip, &c->pages, list) { - long idx = ((long)slot - (long)kip->insns) / - (c->insn_size * sizeof(kprobe_opcode_t)); - if (idx >= 0 && idx < slots_per_page(c)) { - WARN_ON(kip->slot_used[idx] != SLOT_USED); - if (dirty) { - kip->slot_used[idx] = SLOT_DIRTY; - kip->ngarbage++; - if (++c->nr_garbage > slots_per_page(c)) - collect_garbage_slots(c); - } else - collect_one_slot(kip, idx); + rcu_read_lock(); + list_for_each_entry_rcu(kip, &c->pages, list) { + idx = ((long)slot - (long)kip->insns) / + (c->insn_size * sizeof(kprobe_opcode_t)); + if (idx >= 0 && idx < slots_per_page(c)) goto out; - } } - /* Could not free this slot. */ + /* Could not find this slot. */ WARN_ON(1); + kip = NULL; out: + rcu_read_unlock(); + /* Mark and sweep: this may sleep */ + if (kip) { + /* Check double free */ + WARN_ON(kip->slot_used[idx] != SLOT_USED); + if (dirty) { + kip->slot_used[idx] = SLOT_DIRTY; + kip->ngarbage++; + if (++c->nr_garbage > slots_per_page(c)) + collect_garbage_slots(c); + } else { + collect_one_slot(kip, idx); + } + } mutex_unlock(&c->mutex); } +/* + * Check given address is on the page of kprobe instruction slots. + * This will be used for checking whether the address on a stack + * is on a text area or not. + */ +bool __is_insn_slot_addr(struct kprobe_insn_cache *c, unsigned long addr) +{ + struct kprobe_insn_page *kip; + bool ret = false; + + rcu_read_lock(); + list_for_each_entry_rcu(kip, &c->pages, list) { + if (addr >= (unsigned long)kip->insns && + addr < (unsigned long)kip->insns + PAGE_SIZE) { + ret = true; + break; + } + } + rcu_read_unlock(); + + return ret; +} + #ifdef CONFIG_OPTPROBES /* For optimized_kprobe buffer */ struct kprobe_insn_cache kprobe_optinsn_slots = { diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index 396e204888b3..b86ee54da2d1 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -277,6 +277,11 @@ int load_bpf_file(char *path) Elf_Data *data, *data_prog, *symbols = NULL; char *shname, *shname_prog; + /* reset global variables */ + kern_version = 0; + memset(license, 0, sizeof(license)); + memset(processed_sec, 0, sizeof(processed_sec)); + if (elf_version(EV_CURRENT) == EV_NONE) return 1; @@ -328,6 +333,8 @@ int load_bpf_file(char *path) /* load programs that need map fixup (relocations) */ for (i = 1; i < ehdr.e_shnum; i++) { + if (processed_sec[i]) + continue; if (get_sec(elf, i, &ehdr, &shname, &shdr, &data)) continue; diff --git a/samples/bpf/tracex5_kern.c b/samples/bpf/tracex5_kern.c index fd12d7154d42..7e4cf74553ff 100644 --- a/samples/bpf/tracex5_kern.c +++ b/samples/bpf/tracex5_kern.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "bpf_helpers.h" #define PROG(F) SEC("kprobe/"__stringify(F)) int bpf_func_##F diff --git a/tools/arch/arm/include/uapi/asm/kvm.h b/tools/arch/arm/include/uapi/asm/kvm.h index a2b3eb313a25..af05f8e0903e 100644 --- a/tools/arch/arm/include/uapi/asm/kvm.h +++ b/tools/arch/arm/include/uapi/asm/kvm.h @@ -84,6 +84,15 @@ struct kvm_regs { #define KVM_VGIC_V2_DIST_SIZE 0x1000 #define KVM_VGIC_V2_CPU_SIZE 0x2000 +/* Supported VGICv3 address types */ +#define KVM_VGIC_V3_ADDR_TYPE_DIST 2 +#define KVM_VGIC_V3_ADDR_TYPE_REDIST 3 +#define KVM_VGIC_ITS_ADDR_TYPE 4 + +#define KVM_VGIC_V3_DIST_SIZE SZ_64K +#define KVM_VGIC_V3_REDIST_SIZE (2 * SZ_64K) +#define KVM_VGIC_V3_ITS_SIZE (2 * SZ_64K) + #define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */ #define KVM_ARM_VCPU_PSCI_0_2 1 /* CPU uses PSCI v0.2 */ diff --git a/tools/arch/powerpc/include/uapi/asm/kvm.h b/tools/arch/powerpc/include/uapi/asm/kvm.h index c93cf35ce379..3603b6f51b11 100644 --- a/tools/arch/powerpc/include/uapi/asm/kvm.h +++ b/tools/arch/powerpc/include/uapi/asm/kvm.h @@ -573,6 +573,10 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_SPRG9 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba) #define KVM_REG_PPC_DBSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb) +/* POWER9 registers */ +#define KVM_REG_PPC_TIDR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc) +#define KVM_REG_PPC_PSSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd) + /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs */ @@ -596,6 +600,7 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_TM_VSCR (KVM_REG_PPC_TM | KVM_REG_SIZE_U32 | 0x67) #define KVM_REG_PPC_TM_DSCR (KVM_REG_PPC_TM | KVM_REG_SIZE_U64 | 0x68) #define KVM_REG_PPC_TM_TAR (KVM_REG_PPC_TM | KVM_REG_SIZE_U64 | 0x69) +#define KVM_REG_PPC_TM_XER (KVM_REG_PPC_TM | KVM_REG_SIZE_U64 | 0x6a) /* PPC64 eXternal Interrupt Controller Specification */ #define KVM_DEV_XICS_GRP_SOURCES 1 /* 64-bit source attributes */ diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index cddd5d06e1cb..eafee3161d1c 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -105,6 +105,7 @@ #define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ +#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */ /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ #define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ @@ -188,10 +189,14 @@ #define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ #define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ +#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ +#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ +#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ +#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ @@ -220,11 +225,13 @@ #define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ #define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ #define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */ +#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ #define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ #define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ #define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */ #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ +#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ #define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */ #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */ @@ -278,8 +285,10 @@ #define X86_FEATURE_AVIC (15*32+13) /* Virtual Interrupt Controller */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ +#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ +#define X86_FEATURE_RDPID (16*32+ 22) /* RDPID instruction */ /* AMD-defined CPU features, CPUID level 0x80000007 (ebx), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */ @@ -310,4 +319,6 @@ #define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ #define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ +#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ + #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/tools/arch/x86/include/uapi/asm/vmx.h b/tools/arch/x86/include/uapi/asm/vmx.h index 37fee272618f..14458658e988 100644 --- a/tools/arch/x86/include/uapi/asm/vmx.h +++ b/tools/arch/x86/include/uapi/asm/vmx.h @@ -65,6 +65,8 @@ #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS 44 #define EXIT_REASON_EOI_INDUCED 45 +#define EXIT_REASON_GDTR_IDTR 46 +#define EXIT_REASON_LDTR_TR 47 #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_INVEPT 50 @@ -113,6 +115,8 @@ { EXIT_REASON_MCE_DURING_VMENTRY, "MCE_DURING_VMENTRY" }, \ { EXIT_REASON_TPR_BELOW_THRESHOLD, "TPR_BELOW_THRESHOLD" }, \ { EXIT_REASON_APIC_ACCESS, "APIC_ACCESS" }, \ + { EXIT_REASON_GDTR_IDTR, "GDTR_IDTR" }, \ + { EXIT_REASON_LDTR_TR, "LDTR_TR" }, \ { EXIT_REASON_EPT_VIOLATION, "EPT_VIOLATION" }, \ { EXIT_REASON_EPT_MISCONFIG, "EPT_MISCONFIG" }, \ { EXIT_REASON_INVEPT, "INVEPT" }, \ @@ -129,6 +133,7 @@ { EXIT_REASON_XRSTORS, "XRSTORS" } #define VMX_ABORT_SAVE_GUEST_MSR_FAIL 1 +#define VMX_ABORT_LOAD_HOST_PDPTE_FAIL 2 #define VMX_ABORT_LOAD_HOST_MSR_FAIL 4 #endif /* _UAPIVMX_H */ diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build index 99c0ccd2f176..e279a71c650d 100644 --- a/tools/build/Makefile.build +++ b/tools/build/Makefile.build @@ -19,6 +19,16 @@ else Q=@ endif +ifneq ($(filter 4.%,$(MAKE_VERSION)),) # make-4 +ifneq ($(filter %s ,$(firstword x$(MAKEFLAGS))),) + quiet=silent_ +endif +else # make-3.8x +ifneq ($(filter s% -s%,$(MAKEFLAGS)),) + quiet=silent_ +endif +endif + build-dir := $(srctree)/tools/build # Define $(fixdep) for dep-cmd function diff --git a/tools/include/linux/compiler-gcc.h b/tools/include/linux/compiler-gcc.h new file mode 100644 index 000000000000..48af2f10a42d --- /dev/null +++ b/tools/include/linux/compiler-gcc.h @@ -0,0 +1,14 @@ +#ifndef _TOOLS_LINUX_COMPILER_H_ +#error "Please don't include directly, include instead." +#endif + +/* + * Common definitions for all gcc versions go here. + */ +#define GCC_VERSION (__GNUC__ * 10000 \ + + __GNUC_MINOR__ * 100 \ + + __GNUC_PATCHLEVEL__) + +#if GCC_VERSION >= 70000 && !defined(__CHECKER__) +# define __fallthrough __attribute__ ((fallthrough)) +#endif diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h index e33fc1df3935..6326ede9aece 100644 --- a/tools/include/linux/compiler.h +++ b/tools/include/linux/compiler.h @@ -1,6 +1,10 @@ #ifndef _TOOLS_LINUX_COMPILER_H_ #define _TOOLS_LINUX_COMPILER_H_ +#ifdef __GNUC__ +#include +#endif + /* Optimization barrier */ /* The "volatile" is due to gcc bugs */ #define barrier() __asm__ __volatile__("": : :"memory") @@ -126,4 +130,9 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s #define WRITE_ONCE(x, val) \ ({ union { typeof(x) __val; char __c[1]; } __u = { .__val = (val) }; __write_once_size(&(x), __u.__c, sizeof(x)); __u.__val; }) + +#ifndef __fallthrough +# define __fallthrough +#endif + #endif /* _TOOLS_LINUX_COMPILER_H */ diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile index adba83b325d5..eb6e0b36bfc1 100644 --- a/tools/lib/api/Makefile +++ b/tools/lib/api/Makefile @@ -17,7 +17,13 @@ MAKEFLAGS += --no-print-directory LIBFILE = $(OUTPUT)libapi.a CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC +CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC + +ifeq ($(CC), clang) + CFLAGS += -O3 +else + CFLAGS += -O6 +endif # Treat warnings as errors unless directed not to ifneq ($(WERROR),0) diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c index f99f49e4a31e..4b6bfc43cccf 100644 --- a/tools/lib/api/fs/fs.c +++ b/tools/lib/api/fs/fs.c @@ -38,6 +38,10 @@ #define HUGETLBFS_MAGIC 0x958458f6 #endif +#ifndef BPF_FS_MAGIC +#define BPF_FS_MAGIC 0xcafe4a11 +#endif + static const char * const sysfs__fs_known_mountpoints[] = { "/sys", 0, @@ -75,6 +79,11 @@ static const char * const hugetlbfs__known_mountpoints[] = { 0, }; +static const char * const bpf_fs__known_mountpoints[] = { + "/sys/fs/bpf", + 0, +}; + struct fs { const char *name; const char * const *mounts; @@ -89,6 +98,7 @@ enum { FS__DEBUGFS = 2, FS__TRACEFS = 3, FS__HUGETLBFS = 4, + FS__BPF_FS = 5, }; #ifndef TRACEFS_MAGIC @@ -121,6 +131,11 @@ static struct fs fs__entries[] = { .mounts = hugetlbfs__known_mountpoints, .magic = HUGETLBFS_MAGIC, }, + [FS__BPF_FS] = { + .name = "bpf", + .mounts = bpf_fs__known_mountpoints, + .magic = BPF_FS_MAGIC, + }, }; static bool fs__read_mounts(struct fs *fs) @@ -280,6 +295,7 @@ FS(procfs, FS__PROCFS); FS(debugfs, FS__DEBUGFS); FS(tracefs, FS__TRACEFS); FS(hugetlbfs, FS__HUGETLBFS); +FS(bpf_fs, FS__BPF_FS); int filename__read_int(const char *filename, int *value) { diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h index a63269f5d20c..6b332dc74498 100644 --- a/tools/lib/api/fs/fs.h +++ b/tools/lib/api/fs/fs.h @@ -22,6 +22,7 @@ FS(procfs) FS(debugfs) FS(tracefs) FS(hugetlbfs) +FS(bpf_fs) #undef FS diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c index 251b7c342a87..3e606b9c443e 100644 --- a/tools/lib/api/fs/tracing_path.c +++ b/tools/lib/api/fs/tracing_path.c @@ -86,9 +86,13 @@ void put_tracing_file(char *file) free(file); } -static int strerror_open(int err, char *buf, size_t size, const char *filename) +int tracing_path__strerror_open_tp(int err, char *buf, size_t size, + const char *sys, const char *name) { char sbuf[128]; + char filename[PATH_MAX]; + + snprintf(filename, PATH_MAX, "%s/%s", sys, name ?: "*"); switch (err) { case ENOENT: @@ -99,10 +103,19 @@ static int strerror_open(int err, char *buf, size_t size, const char *filename) * - jirka */ if (debugfs__configured() || tracefs__configured()) { - snprintf(buf, size, - "Error:\tFile %s/%s not found.\n" - "Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", - tracing_events_path, filename); + /* sdt markers */ + if (!strncmp(filename, "sdt_", 4)) { + snprintf(buf, size, + "Error:\tFile %s/%s not found.\n" + "Hint:\tSDT event cannot be directly recorded on.\n" + "\tPlease first use 'perf probe %s:%s' before recording it.\n", + tracing_events_path, filename, sys, name); + } else { + snprintf(buf, size, + "Error:\tFile %s/%s not found.\n" + "Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", + tracing_events_path, filename); + } break; } snprintf(buf, size, "%s", @@ -125,12 +138,3 @@ static int strerror_open(int err, char *buf, size_t size, const char *filename) return 0; } - -int tracing_path__strerror_open_tp(int err, char *buf, size_t size, const char *sys, const char *name) -{ - char path[PATH_MAX]; - - snprintf(path, PATH_MAX, "%s/%s", sys, name ?: "*"); - - return strerror_open(err, buf, size, path); -} diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index a2f9853dd882..df6e186da788 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -22,6 +22,7 @@ #define __BPF_BPF_H #include +#include int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size, int max_entries, __u32 map_flags); diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 84e6b35da4bd..ac6eb863b2a4 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -4,6 +4,7 @@ * Copyright (C) 2013-2015 Alexei Starovoitov * Copyright (C) 2015 Wang Nan * Copyright (C) 2015 Huawei Inc. + * Copyright (C) 2017 Nicira, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -22,15 +23,21 @@ #include #include #include +#include #include #include #include #include #include #include +#include #include #include #include +#include +#include +#include +#include #include #include @@ -41,6 +48,10 @@ #define EM_BPF 247 #endif +#ifndef BPF_FS_MAGIC +#define BPF_FS_MAGIC 0xcafe4a11 +#endif + #define __printf(a, b) __attribute__((format(printf, a, b))) __printf(1, 2) @@ -779,7 +790,7 @@ static int bpf_program__collect_reloc(struct bpf_program *prog, size_t nr_maps, GElf_Shdr *shdr, Elf_Data *data, Elf_Data *symbols, - int maps_shndx) + int maps_shndx, struct bpf_map *maps) { int i, nrels; @@ -829,7 +840,15 @@ bpf_program__collect_reloc(struct bpf_program *prog, return -LIBBPF_ERRNO__RELOC; } - map_idx = sym.st_value / sizeof(struct bpf_map_def); + /* TODO: 'maps' is sorted. We can use bsearch to make it faster. */ + for (map_idx = 0; map_idx < nr_maps; map_idx++) { + if (maps[map_idx].offset == sym.st_value) { + pr_debug("relocation: find map %zd (%s) for insn %u\n", + map_idx, maps[map_idx].name, insn_idx); + break; + } + } + if (map_idx >= nr_maps) { pr_warning("bpf relocation: map_idx %d large than %d\n", (int)map_idx, (int)nr_maps - 1); @@ -953,7 +972,8 @@ static int bpf_object__collect_reloc(struct bpf_object *obj) err = bpf_program__collect_reloc(prog, nr_maps, shdr, data, obj->efile.symbols, - obj->efile.maps_shndx); + obj->efile.maps_shndx, + obj->maps); if (err) return err; } @@ -1227,6 +1247,191 @@ int bpf_object__load(struct bpf_object *obj) return err; } +static int check_path(const char *path) +{ + struct statfs st_fs; + char *dname, *dir; + int err = 0; + + if (path == NULL) + return -EINVAL; + + dname = strdup(path); + if (dname == NULL) + return -ENOMEM; + + dir = dirname(dname); + if (statfs(dir, &st_fs)) { + pr_warning("failed to statfs %s: %s\n", dir, strerror(errno)); + err = -errno; + } + free(dname); + + if (!err && st_fs.f_type != BPF_FS_MAGIC) { + pr_warning("specified path %s is not on BPF FS\n", path); + err = -EINVAL; + } + + return err; +} + +int bpf_program__pin_instance(struct bpf_program *prog, const char *path, + int instance) +{ + int err; + + err = check_path(path); + if (err) + return err; + + if (prog == NULL) { + pr_warning("invalid program pointer\n"); + return -EINVAL; + } + + if (instance < 0 || instance >= prog->instances.nr) { + pr_warning("invalid prog instance %d of prog %s (max %d)\n", + instance, prog->section_name, prog->instances.nr); + return -EINVAL; + } + + if (bpf_obj_pin(prog->instances.fds[instance], path)) { + pr_warning("failed to pin program: %s\n", strerror(errno)); + return -errno; + } + pr_debug("pinned program '%s'\n", path); + + return 0; +} + +static int make_dir(const char *path) +{ + int err = 0; + + if (mkdir(path, 0700) && errno != EEXIST) + err = -errno; + + if (err) + pr_warning("failed to mkdir %s: %s\n", path, strerror(-err)); + return err; +} + +int bpf_program__pin(struct bpf_program *prog, const char *path) +{ + int i, err; + + err = check_path(path); + if (err) + return err; + + if (prog == NULL) { + pr_warning("invalid program pointer\n"); + return -EINVAL; + } + + if (prog->instances.nr <= 0) { + pr_warning("no instances of prog %s to pin\n", + prog->section_name); + return -EINVAL; + } + + err = make_dir(path); + if (err) + return err; + + for (i = 0; i < prog->instances.nr; i++) { + char buf[PATH_MAX]; + int len; + + len = snprintf(buf, PATH_MAX, "%s/%d", path, i); + if (len < 0) + return -EINVAL; + else if (len >= PATH_MAX) + return -ENAMETOOLONG; + + err = bpf_program__pin_instance(prog, buf, i); + if (err) + return err; + } + + return 0; +} + +int bpf_map__pin(struct bpf_map *map, const char *path) +{ + int err; + + err = check_path(path); + if (err) + return err; + + if (map == NULL) { + pr_warning("invalid map pointer\n"); + return -EINVAL; + } + + if (bpf_obj_pin(map->fd, path)) { + pr_warning("failed to pin map: %s\n", strerror(errno)); + return -errno; + } + + pr_debug("pinned map '%s'\n", path); + return 0; +} + +int bpf_object__pin(struct bpf_object *obj, const char *path) +{ + struct bpf_program *prog; + struct bpf_map *map; + int err; + + if (!obj) + return -ENOENT; + + if (!obj->loaded) { + pr_warning("object not yet loaded; load it first\n"); + return -ENOENT; + } + + err = make_dir(path); + if (err) + return err; + + bpf_map__for_each(map, obj) { + char buf[PATH_MAX]; + int len; + + len = snprintf(buf, PATH_MAX, "%s/%s", path, + bpf_map__name(map)); + if (len < 0) + return -EINVAL; + else if (len >= PATH_MAX) + return -ENAMETOOLONG; + + err = bpf_map__pin(map, buf); + if (err) + return err; + } + + bpf_object__for_each_program(prog, obj) { + char buf[PATH_MAX]; + int len; + + len = snprintf(buf, PATH_MAX, "%s/%s", path, + prog->section_name); + if (len < 0) + return -EINVAL; + else if (len >= PATH_MAX) + return -ENAMETOOLONG; + + err = bpf_program__pin(prog, buf); + if (err) + return err; + } + + return 0; +} + void bpf_object__close(struct bpf_object *obj) { size_t i; @@ -1419,37 +1624,33 @@ static void bpf_program__set_type(struct bpf_program *prog, prog->type = type; } -int bpf_program__set_tracepoint(struct bpf_program *prog) -{ - if (!prog) - return -EINVAL; - bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT); - return 0; -} - -int bpf_program__set_kprobe(struct bpf_program *prog) -{ - if (!prog) - return -EINVAL; - bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE); - return 0; -} - static bool bpf_program__is_type(struct bpf_program *prog, enum bpf_prog_type type) { return prog ? (prog->type == type) : false; } -bool bpf_program__is_tracepoint(struct bpf_program *prog) -{ - return bpf_program__is_type(prog, BPF_PROG_TYPE_TRACEPOINT); -} - -bool bpf_program__is_kprobe(struct bpf_program *prog) -{ - return bpf_program__is_type(prog, BPF_PROG_TYPE_KPROBE); -} +#define BPF_PROG_TYPE_FNS(NAME, TYPE) \ +int bpf_program__set_##NAME(struct bpf_program *prog) \ +{ \ + if (!prog) \ + return -EINVAL; \ + bpf_program__set_type(prog, TYPE); \ + return 0; \ +} \ + \ +bool bpf_program__is_##NAME(struct bpf_program *prog) \ +{ \ + return bpf_program__is_type(prog, TYPE); \ +} \ + +BPF_PROG_TYPE_FNS(socket_filter, BPF_PROG_TYPE_SOCKET_FILTER); +BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE); +BPF_PROG_TYPE_FNS(sched_cls, BPF_PROG_TYPE_SCHED_CLS); +BPF_PROG_TYPE_FNS(sched_act, BPF_PROG_TYPE_SCHED_ACT); +BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT); +BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP); +BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT); int bpf_map__fd(struct bpf_map *map) { @@ -1537,3 +1738,10 @@ bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset) } return ERR_PTR(-ENOENT); } + +long libbpf_get_error(const void *ptr) +{ + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + return 0; +} diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index a5a8b86a06fe..b30394f9947a 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -22,8 +22,8 @@ #define __BPF_LIBBPF_H #include +#include #include -#include #include // for size_t enum libbpf_errno { @@ -65,6 +65,7 @@ struct bpf_object *bpf_object__open(const char *path); struct bpf_object *bpf_object__open_buffer(void *obj_buf, size_t obj_buf_sz, const char *name); +int bpf_object__pin(struct bpf_object *object, const char *path); void bpf_object__close(struct bpf_object *object); /* Load/unload object into/from kernel */ @@ -106,6 +107,9 @@ void *bpf_program__priv(struct bpf_program *prog); const char *bpf_program__title(struct bpf_program *prog, bool needs_copy); int bpf_program__fd(struct bpf_program *prog); +int bpf_program__pin_instance(struct bpf_program *prog, const char *path, + int instance); +int bpf_program__pin(struct bpf_program *prog, const char *path); struct bpf_insn; @@ -174,11 +178,21 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n); /* * Adjust type of bpf program. Default is kprobe. */ +int bpf_program__set_socket_filter(struct bpf_program *prog); int bpf_program__set_tracepoint(struct bpf_program *prog); int bpf_program__set_kprobe(struct bpf_program *prog); +int bpf_program__set_sched_cls(struct bpf_program *prog); +int bpf_program__set_sched_act(struct bpf_program *prog); +int bpf_program__set_xdp(struct bpf_program *prog); +int bpf_program__set_perf_event(struct bpf_program *prog); +bool bpf_program__is_socket_filter(struct bpf_program *prog); bool bpf_program__is_tracepoint(struct bpf_program *prog); bool bpf_program__is_kprobe(struct bpf_program *prog); +bool bpf_program__is_sched_cls(struct bpf_program *prog); +bool bpf_program__is_sched_act(struct bpf_program *prog); +bool bpf_program__is_xdp(struct bpf_program *prog); +bool bpf_program__is_perf_event(struct bpf_program *prog); /* * We don't need __attribute__((packed)) now since it is @@ -223,5 +237,8 @@ typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void *); int bpf_map__set_priv(struct bpf_map *map, void *priv, bpf_map_clear_priv_t clear_priv); void *bpf_map__priv(struct bpf_map *map); +int bpf_map__pin(struct bpf_map *map, const char *path); + +long libbpf_get_error(const void *ptr); #endif diff --git a/tools/lib/subcmd/Makefile b/tools/lib/subcmd/Makefile index 3f8cc44a0dbd..3d1c3b5b5150 100644 --- a/tools/lib/subcmd/Makefile +++ b/tools/lib/subcmd/Makefile @@ -19,7 +19,13 @@ MAKEFLAGS += --no-print-directory LIBFILE = $(OUTPUT)libsubcmd.a CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -O6 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC +CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC + +ifeq ($(CC), clang) + CFLAGS += -O3 +else + CFLAGS += -O6 +endif # Treat warnings as errors unless directed not to ifneq ($(WERROR),0) diff --git a/tools/lib/subcmd/parse-options.c b/tools/lib/subcmd/parse-options.c index 8aad81151d50..6bc24025d054 100644 --- a/tools/lib/subcmd/parse-options.c +++ b/tools/lib/subcmd/parse-options.c @@ -270,6 +270,8 @@ static int get_value(struct parse_opt_ctx_t *p, } if (get_arg(p, opt, flags, &arg)) return -1; + if (arg[0] == '-') + return opterror(opt, "expects an unsigned numerical value", flags); *(unsigned int *)opt->value = strtol(arg, (char **)&s, 10); if (*s) return opterror(opt, "expects a numerical value", flags); @@ -302,6 +304,8 @@ static int get_value(struct parse_opt_ctx_t *p, } if (get_arg(p, opt, flags, &arg)) return -1; + if (arg[0] == '-') + return opterror(opt, "expects an unsigned numerical value", flags); *(u64 *)opt->value = strtoull(arg, (char **)&s, 10); if (*s) return opterror(opt, "expects a numerical value", flags); diff --git a/tools/lib/subcmd/parse-options.h b/tools/lib/subcmd/parse-options.h index 11c3be3bcce7..f054ca1b899d 100644 --- a/tools/lib/subcmd/parse-options.h +++ b/tools/lib/subcmd/parse-options.h @@ -1,6 +1,7 @@ #ifndef __SUBCMD_PARSE_OPTIONS_H #define __SUBCMD_PARSE_OPTIONS_H +#include #include #include @@ -132,32 +133,32 @@ struct option { #define OPT_UINTEGER(s, l, v, h) { .type = OPTION_UINTEGER, .short_name = (s), .long_name = (l), .value = check_vtype(v, unsigned int *), .help = (h) } #define OPT_LONG(s, l, v, h) { .type = OPTION_LONG, .short_name = (s), .long_name = (l), .value = check_vtype(v, long *), .help = (h) } #define OPT_U64(s, l, v, h) { .type = OPTION_U64, .short_name = (s), .long_name = (l), .value = check_vtype(v, u64 *), .help = (h) } -#define OPT_STRING(s, l, v, a, h) { .type = OPTION_STRING, .short_name = (s), .long_name = (l), .value = check_vtype(v, const char **), (a), .help = (h) } +#define OPT_STRING(s, l, v, a, h) { .type = OPTION_STRING, .short_name = (s), .long_name = (l), .value = check_vtype(v, const char **), .argh = (a), .help = (h) } #define OPT_STRING_OPTARG(s, l, v, a, h, d) \ { .type = OPTION_STRING, .short_name = (s), .long_name = (l), \ - .value = check_vtype(v, const char **), (a), .help = (h), \ + .value = check_vtype(v, const char **), .argh =(a), .help = (h), \ .flags = PARSE_OPT_OPTARG, .defval = (intptr_t)(d) } #define OPT_STRING_OPTARG_SET(s, l, v, os, a, h, d) \ { .type = OPTION_STRING, .short_name = (s), .long_name = (l), \ - .value = check_vtype(v, const char **), (a), .help = (h), \ + .value = check_vtype(v, const char **), .argh = (a), .help = (h), \ .flags = PARSE_OPT_OPTARG, .defval = (intptr_t)(d), \ .set = check_vtype(os, bool *)} -#define OPT_STRING_NOEMPTY(s, l, v, a, h) { .type = OPTION_STRING, .short_name = (s), .long_name = (l), .value = check_vtype(v, const char **), (a), .help = (h), .flags = PARSE_OPT_NOEMPTY} +#define OPT_STRING_NOEMPTY(s, l, v, a, h) { .type = OPTION_STRING, .short_name = (s), .long_name = (l), .value = check_vtype(v, const char **), .argh = (a), .help = (h), .flags = PARSE_OPT_NOEMPTY} #define OPT_DATE(s, l, v, h) \ { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = "time", .help = (h), .callback = parse_opt_approxidate_cb } #define OPT_CALLBACK(s, l, v, a, h, f) \ - { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), (a), .help = (h), .callback = (f) } + { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f) } #define OPT_CALLBACK_NOOPT(s, l, v, a, h, f) \ - { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), (a), .help = (h), .callback = (f), .flags = PARSE_OPT_NOARG } + { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f), .flags = PARSE_OPT_NOARG } #define OPT_CALLBACK_DEFAULT(s, l, v, a, h, f, d) \ - { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), (a), .help = (h), .callback = (f), .defval = (intptr_t)d, .flags = PARSE_OPT_LASTARG_DEFAULT } + { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f), .defval = (intptr_t)d, .flags = PARSE_OPT_LASTARG_DEFAULT } #define OPT_CALLBACK_DEFAULT_NOOPT(s, l, v, a, h, f, d) \ { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l),\ - .value = (v), (a), .help = (h), .callback = (f), .defval = (intptr_t)d,\ + .value = (v), .arg = (a), .help = (h), .callback = (f), .defval = (intptr_t)d,\ .flags = PARSE_OPT_LASTARG_DEFAULT | PARSE_OPT_NOARG} #define OPT_CALLBACK_OPTARG(s, l, v, d, a, h, f) \ { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), \ - .value = (v), (a), .help = (h), .callback = (f), \ + .value = (v), .argh = (a), .help = (h), .callback = (f), \ .flags = PARSE_OPT_OPTARG, .data = (d) } /* parse_options() will filter out the processed options and leave the diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile index 2616c66e10c1..47076b15eebe 100644 --- a/tools/lib/traceevent/Makefile +++ b/tools/lib/traceevent/Makefile @@ -257,10 +257,16 @@ define do_install_plugins endef define do_generate_dynamic_list_file - (echo '{'; \ - $(NM) -u -D $1 | awk 'NF>1 {print "\t"$$2";"}' | sort -u; \ - echo '};'; \ - ) > $2 + symbol_type=`$(NM) -u -D $1 | awk 'NF>1 {print $$1}' | \ + xargs echo "U W w" | tr ' ' '\n' | sort -u | xargs echo`;\ + if [ "$$symbol_type" = "U W w" ];then \ + (echo '{'; \ + $(NM) -u -D $1 | awk 'NF>1 {print "\t"$$2";"}' | sort -u;\ + echo '};'; \ + ) > $2; \ + else \ + (echo Either missing one of [$1] or bad version of $(NM)) 1>&2;\ + fi endef install_lib: all_cmd install_plugins diff --git a/tools/lib/traceevent/kbuffer-parse.c b/tools/lib/traceevent/kbuffer-parse.c index 65984f1c2974..c94e3641b046 100644 --- a/tools/lib/traceevent/kbuffer-parse.c +++ b/tools/lib/traceevent/kbuffer-parse.c @@ -315,6 +315,7 @@ static unsigned int old_update_pointers(struct kbuffer *kbuf) extend += delta; delta = extend; ptr += 4; + length = 0; break; case OLD_RINGBUF_TYPE_TIME_STAMP: diff --git a/tools/lib/traceevent/plugin_function.c b/tools/lib/traceevent/plugin_function.c index a00ec190821a..42dbf73758f3 100644 --- a/tools/lib/traceevent/plugin_function.c +++ b/tools/lib/traceevent/plugin_function.c @@ -130,7 +130,7 @@ static int function_handler(struct trace_seq *s, struct pevent_record *record, unsigned long long pfunction; const char *func; const char *parent; - int index; + int index = 0; if (pevent_get_field_val(s, event, "ip", record, &function, 1)) return trace_seq_putc(s, '!'); diff --git a/tools/perf/Build b/tools/perf/Build index b12d5d1666e3..9b79f8d7db50 100644 --- a/tools/perf/Build +++ b/tools/perf/Build @@ -3,10 +3,12 @@ perf-y += builtin-annotate.o perf-y += builtin-config.o perf-y += builtin-diff.o perf-y += builtin-evlist.o +perf-y += builtin-ftrace.o perf-y += builtin-help.o perf-y += builtin-sched.o perf-y += builtin-buildid-list.o perf-y += builtin-buildid-cache.o +perf-y += builtin-kallsyms.o perf-y += builtin-list.o perf-y += builtin-record.o perf-y += builtin-report.o @@ -39,8 +41,7 @@ CFLAGS_builtin-help.o += $(paths) CFLAGS_builtin-timechart.o += $(paths) CFLAGS_perf.o += -DPERF_HTML_PATH="BUILD_STR($(htmldir_SQ))" \ -DPERF_EXEC_PATH="BUILD_STR($(perfexecdir_SQ))" \ - -DPREFIX="BUILD_STR($(prefix_SQ))" \ - -include $(OUTPUT)PERF-VERSION-FILE + -DPREFIX="BUILD_STR($(prefix_SQ))" CFLAGS_builtin-trace.o += -DSTRACE_GROUPS_DIR="BUILD_STR($(STRACE_GROUPS_DIR_SQ))" CFLAGS_builtin-report.o += -DTIPDIR="BUILD_STR($(tipdir_SQ))" CFLAGS_builtin-report.o += -DDOCDIR="BUILD_STR($(srcdir_SQ)/Documentation)" diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt index 3f06730c7f47..2da07e51e119 100644 --- a/tools/perf/Documentation/perf-c2c.txt +++ b/tools/perf/Documentation/perf-c2c.txt @@ -248,7 +248,7 @@ Following fields are available and governs the final Code address, Code symbol, Shared Object, Source line dso - coalesced by shared object -By default the coalescing is setup with 'pid,tid,iaddr'. +By default the coalescing is setup with 'pid,iaddr'. STDIO OUTPUT ------------ diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt index 9365b75fd04f..5b4fff3adc4b 100644 --- a/tools/perf/Documentation/perf-config.txt +++ b/tools/perf/Documentation/perf-config.txt @@ -498,6 +498,18 @@ Variables But if this option is 'no-cache', it will not update the build-id cache. 'skip' skips post-processing and does not update the cache. +diff.*:: + diff.order:: + This option sets the number of columns to sort the result. + The default is 0, which means sorting by baseline. + Setting it to 1 will sort the result by delta (or other + compute method selected). + + diff.compute:: + This options sets the method for computing the diff result. + Possible values are 'delta', 'delta-abs', 'ratio' and + 'wdiff'. Default is 'delta'. + SEE ALSO -------- linkperf:perf[1] diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt index 3e9490b9c533..66dbe3dee74b 100644 --- a/tools/perf/Documentation/perf-diff.txt +++ b/tools/perf/Documentation/perf-diff.txt @@ -86,8 +86,9 @@ OPTIONS -c:: --compute:: - Differential computation selection - delta,ratio,wdiff (default is delta). - See COMPARISON METHODS section for more info. + Differential computation selection - delta, ratio, wdiff, delta-abs + (default is delta-abs). Default can be changed using diff.compute + config option. See COMPARISON METHODS section for more info. -p:: --period:: @@ -99,7 +100,11 @@ OPTIONS -o:: --order:: - Specify compute sorting column number. + Specify compute sorting column number. 0 means sorting by baseline + overhead and 1 (default) means sorting by computed value of column 1 + (data from the first file other base baseline). Values more than 1 + can be used only if enough data files are provided. + The default value can be set using the diff.order config option. --percentage:: Determine how to display the overhead percentage of filtered entries. @@ -181,6 +186,10 @@ delta relative to how entries are filtered. Use --percentage=absolute to prevent such fluctuation. +delta-abs +~~~~~~~~~ +Same as 'delta` method, but sort the result with the absolute values. + ratio ~~~~~ If specified the 'Ratio' column is displayed with value 'r' computed as: diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt new file mode 100644 index 000000000000..2d96de6132a9 --- /dev/null +++ b/tools/perf/Documentation/perf-ftrace.txt @@ -0,0 +1,36 @@ +perf-ftrace(1) +============= + +NAME +---- +perf-ftrace - simple wrapper for kernel's ftrace functionality + + +SYNOPSIS +-------- +[verse] +'perf ftrace' + +DESCRIPTION +----------- +The 'perf ftrace' command is a simple wrapper of kernel's ftrace +functionality. It only supports single thread tracing currently and +just reads trace_pipe in text and then write it to stdout. + +The following options apply to perf ftrace. + +OPTIONS +------- + +-t:: +--tracer=:: + Tracer to use: function_graph or function. + +-v:: +--verbose=:: + Verbosity level. + + +SEE ALSO +-------- +linkperf:perf-record[1], linkperf:perf-trace[1] diff --git a/tools/perf/Documentation/perf-kallsyms.txt b/tools/perf/Documentation/perf-kallsyms.txt new file mode 100644 index 000000000000..954ea9e21236 --- /dev/null +++ b/tools/perf/Documentation/perf-kallsyms.txt @@ -0,0 +1,24 @@ +perf-kallsyms(1) +============== + +NAME +---- +perf-kallsyms - Searches running kernel for symbols + +SYNOPSIS +-------- +[verse] +'perf kallsyms symbol_name[,symbol_name...]' + +DESCRIPTION +----------- +This command searches the running kernel kallsyms file for the given symbol(s) +and prints information about it, including the DSO, the kallsyms begin/end +addresses and the addresses in the ELF kallsyms symbol table (for symbols in +modules). + +OPTIONS +------- +-v:: +--verbose=:: + Increase verbosity level, showing details about symbol table loading, etc. diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 5054d9147f0f..27256bc68eda 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -421,9 +421,19 @@ Configure all used events to run in user space. --timestamp-filename Append timestamp to output file name. ---switch-output:: +--switch-output[=mode]:: Generate multiple perf.data files, timestamp prefixed, switching to a new one -when receiving a SIGUSR2. +based on 'mode' value: + "signal" - when receiving a SIGUSR2 (default value) or + - when reaching the size threshold, size is expected to + be a number with appended unit character - B/K/M/G +