public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL 0/1] perf probe fix
@ 2017-06-23  0:02 Arnaldo Carvalho de Melo
  2017-06-23  0:02 ` [PATCH 1/1] perf probe: Fix probe definition for inlined functions Arnaldo Carvalho de Melo
  2017-06-23  8:04 ` [GIT PULL 0/1] perf probe fix Ingo Molnar
  0 siblings, 2 replies; 3+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-06-23  0:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Björn Töpel,
	Magnus Karlsson, Masami Hiramatsu, stable,
	Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit fb3a5055cd7098f8d1dd0cd38d7172211113255f:

  perf/x86/intel: Add 1G DTLB load/store miss support for SKL (2017-06-22 11:07:08 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-urgent-for-mingo-4.12-20170622

for you to fetch changes up to 7598f8bc1383ffd77686cb4e92e749bef3c75937:

  perf probe: Fix probe definition for inlined functions (2017-06-22 16:08:09 -0300)

----------------------------------------------------------------
perf probe fix:

- Do not double the offset of inline expansions when using
  'perf probe' on inlined functions (Björn Töpel)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Björn Töpel (1):
      perf probe: Fix probe definition for inlined functions

 tools/perf/util/probe-event.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Test results:

The first ones are container (docker) based builds of tools/perf with and
without libelf support, objtool where it is supported and samples/bpf/, ditto.
Where clang is available, it is also used to build perf with/without libelf.

Several are cross builds, the ones with -x-ARCH, and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  # dm
   1 alpine:3.4: Ok
   2 alpine:3.5: Ok
   3 alpine:3.6: Ok
   4 alpine:edge: Ok
   5 android-ndk:r12b-arm: Ok
   6 archlinux:latest: Ok
   7 centos:5: Ok
   8 centos:6: Ok
   9 centos:7: Ok
  10 debian:7: Ok
  11 debian:8: Ok
  12 debian:9: Ok
  13 debian:experimental: Ok
  14 debian:experimental-x-arm64: Ok
  15 debian:experimental-x-mips: Ok
  16 debian:experimental-x-mips64: Ok
  17 debian:experimental-x-mipsel: Ok
  18 fedora:20: Ok
  19 fedora:21: Ok
  20 fedora:22: Ok
  21 fedora:23: Ok
  22 fedora:24: Ok
  23 fedora:24-x-ARC-uClibc: Ok
  24 fedora:25: Ok
  25 fedora:rawhide: Ok
  26 mageia:5: Ok
  27 opensuse:13.2: Ok
  28 opensuse:42.1: Ok
  29 opensuse:tumbleweed: Ok
  30 ubuntu:12.04.5: Ok
  31 ubuntu:14.04.4: Ok
  32 ubuntu:14.04.4-x-linaro-arm64: Ok
  33 ubuntu:15.10: Ok
  34 ubuntu:16.04: Ok
  35 ubuntu:16.04-x-arm: Ok
  36 ubuntu:16.04-x-arm64: Ok
  37 ubuntu:16.04-x-powerpc: Ok
  38 ubuntu:16.04-x-powerpc64: Ok
  39 ubuntu:16.04-x-powerpc64el: Ok
  40 ubuntu:16.04-x-s390: Ok
  41 ubuntu:16.10: Ok
  42 ubuntu:17.04: Ok

  # uname -a
  Linux jouet 4.12.0-rc4+ #1 SMP Fri Jun 9 12:59:23 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
  [root@jouet ~]# perf test
   1: vmlinux symtab matches kallsyms            : Ok
   2: Detect openat syscall event                : Ok
   3: Detect openat syscall event on all cpus    : Ok
   4: Read samples using the mmap interface      : Ok
   5: Parse event definition strings             : Ok
   6: Simple expression parser                   : Ok
   7: PERF_RECORD_* events & perf_sample fields  : Ok
   8: Parse perf pmu format                      : Ok
   9: DSO data read                              : Ok
  10: DSO data cache                             : Ok
  11: DSO data reopen                            : Ok
  12: Roundtrip evsel->name                      : Ok
  13: Parse sched tracepoints fields             : Ok
  14: syscalls:sys_enter_openat event fields     : Ok
  15: Setup struct perf_event_attr               : Ok
  16: Match and link multiple hists              : Ok
  17: 'import perf' in python                    : Ok
  18: Breakpoint overflow signal handler         : Ok
  19: Breakpoint overflow sampling               : Ok
  20: Number of exit events of a simple workload : Ok
  21: Software clock events period values        : Ok
  22: Object code reading                        : Ok
  23: Sample parsing                             : Ok
  24: Use a dummy software event to keep tracking: Ok
  25: Parse with no sample_id_all bit set        : Ok
  26: Filter hist entries                        : Ok
  27: Lookup mmap thread                         : Ok
  28: Share thread mg                            : Ok
  29: Sort output of hist entries                : Ok
  30: Cumulate child hist entries                : Ok
  31: Track with sched_switch                    : Ok
  32: Filter fds with revents mask in a fdarray  : Ok
  33: Add fd to a fdarray, making it autogrow    : Ok
  34: kmod_path__parse                           : Ok
  35: Thread map                                 : Ok
  36: LLVM search and compile                    :
  36.1: Basic BPF llvm compile                    : Ok
  36.2: kbuild searching                          : Ok
  36.3: Compile source for BPF prologue generation: Ok
  36.4: Compile source for BPF relocation         : Ok
  37: Session topology                           : Ok
  38: BPF filter                                 :
  38.1: Basic BPF filtering                      : Ok
  38.2: BPF pinning                              : Ok
  38.3: BPF prologue generation                  : Ok
  38.4: BPF relocation checker                   : Ok
  39: Synthesize thread map                      : Ok
  40: Remove thread map                          : Ok
  41: Synthesize cpu map                         : Ok
  42: Synthesize stat config                     : Ok
  43: Synthesize stat                            : Ok
  44: Synthesize stat round                      : Ok
  45: Synthesize attr update                     : Ok
  46: Event times                                : Ok
  47: Read backward ring buffer                  : Ok
  48: Print cpu map                              : Ok
  49: Probe SDT events                           : Ok
  50: is_printable_array                         : Ok
  51: Print bitmap                               : Ok
  52: perf hooks                                 : Ok
  53: builtin clang support                      : Skip (not compiled in)
  54: unit_number__scnprintf                     : Ok
  55: x86 rdpmc                                  : Ok
  56: Convert perf time to TSC                   : Ok
  57: DWARF unwind                               : Ok
  58: x86 instruction decoder - new instructions : Ok
  59: Intel cqm nmi context read                 : Skip
  #

  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/linux/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
  cd . && make FEATURE_DUMP_COPY=/home/acme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump
                make_no_newt_O: make NO_NEWT=1
                   make_pure_O: make
           make_no_libbionic_O: make NO_LIBBIONIC=1
               make_no_slang_O: make NO_SLANG=1
        make_with_babeltrace_O: make LIBBABELTRACE=1
                 make_perf_o_O: make perf.o
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
              make_no_libbpf_O: make NO_LIBBPF=1
                make_no_gtk2_O: make NO_GTK2=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
           make_no_libpython_O: make NO_LIBPYTHON=1
                 make_static_O: make LDFLAGS=-static
              make_clean_all_O: make clean all
             make_no_libnuma_O: make NO_LIBNUMA=1
             make_util_map_o_O: make util/map.o
                   make_tags_O: make tags
            make_no_demangle_O: make NO_DEMANGLE=1
                make_install_O: make install
                  make_debug_O: make DEBUG=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
            make_install_bin_O: make install-bin
                   make_help_O: make help
            make_no_auxtrace_O: make NO_AUXTRACE=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
             make_no_libperl_O: make NO_LIBPERL=1
           make_no_backtrace_O: make NO_BACKTRACE=1
           make_no_libunwind_O: make NO_LIBUNWIND=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
              make_no_libelf_O: make NO_LIBELF=1
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
            make_no_libaudit_O: make NO_LIBAUDIT=1
         make_install_prefix_O: make install prefix=/tmp/krava
                    make_doc_O: make doc
  OK
  make: Leaving directory '/home/acme/git/linux/tools/perf'
  $ 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] perf probe: Fix probe definition for inlined functions
  2017-06-23  0:02 [GIT PULL 0/1] perf probe fix Arnaldo Carvalho de Melo
@ 2017-06-23  0:02 ` Arnaldo Carvalho de Melo
  2017-06-23  8:04 ` [GIT PULL 0/1] perf probe fix Ingo Molnar
  1 sibling, 0 replies; 3+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-06-23  0:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Björn Töpel, stable,
	Arnaldo Carvalho de Melo

From: Björn Töpel <bjorn.topel@intel.com>

In commit 613f050d68a8 ("perf probe: Fix to probe on gcc generated
functions in modules"), the offset from symbol is, incorrectly, added
to the trace point address. This leads to incorrect probe trace points
for inlined functions and when using relative line number on symbols.

Prior this patch:
  $ perf probe -m nf_nat -D in_range
  p:probe/in_range nf_nat:in_range.isra.9+0
  $ perf probe -m i40e -D i40e_clean_rx_irq
  p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
  $ perf probe -m i40e -D i40e_clean_rx_irq:16
  p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626

After:
  $ perf probe -m nf_nat -D in_range
  p:probe/in_range nf_nat:in_range.isra.9+0
  $ perf probe -m i40e -D i40e_clean_rx_irq
  p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
  $ perf probe -m i40e -D i40e_clean_rx_irq:16
  p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665

Committer testing:

Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
the functions that while not being explicitely marked as inline, were inlined
by the compiler:

  # pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
  __ew32
  e1000_regdump
  e1000e_dump_ps_pages
  e1000_desc_unused
  e1000e_systim_to_hwtstamp
  e1000e_rx_hwtstamp
  e1000e_update_rdt_wa
  e1000e_update_tdt_wa
  e1000_put_txbuf
  e1000_consume_page

Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
them:

  # perf probe -m e1000e -D e1000e_rx_hwtstamp
  p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74

  # perf probe -m e1000e -D e1000_consume_page
  p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
  p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
  p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074

Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
that function:

  $ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
  <SNIP>
  <1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
    <13e27c>   DW_AT_name        : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
    <13e287>   DW_AT_low_pc      : 0x17a30
  <3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
    <13e6f0>   DW_AT_abstract_origin: <0x13ed2c>
    <13e6f4>   DW_AT_low_pc      : 0x17be6
  <SNIP>
  <1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
     <13ed2e>   DW_AT_name        : (indirect string, offset: 0xa54c3): e1000_consume_page

So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
address, gives us the offset we should use in the probe definition:

  0x17be6 - 0x17a30 = 438

but above we have 876, which is twice as much.

Lets see the second inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():

  <3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
    <13e86f>   DW_AT_abstract_origin: <0x13ed2c>
    <13e873>   DW_AT_low_pc      : 0x17d21

  0x17d21 - 0x17a30 = 753

So we where adding it at twice the offset from the containing function as we
should.

And then after this patch:

  # perf probe -m e1000e -D e1000e_rx_hwtstamp
  p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37

  # perf probe -m e1000e -D e1000_consume_page
  p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
  p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
  p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
  #

Which matches the two first expansions and shows that because we were
doubling the offset it would spill over the next function:

  readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
   673: 0000000000017a30  1626 FUNC    LOCAL  DEFAULT    2 e1000_clean_jumbo_rx_irq
   674: 0000000000018090  2013 FUNC    LOCAL  DEFAULT    2 e1000_clean_rx_irq_ps

This is the 3rd inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():

   <3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
    <13ec78>   DW_AT_abstract_origin: <0x13ed2c>
    <13ec7c>   DW_AT_low_pc      : 0x17f79

  0x17f79 - 0x17a30 = 1353

 So:

   0x17a30 + 2 * 1353 = 0x184c2

  And:

   0x184c2 - 0x18090 = 1074

Which explains the bogus third expansion for e1000_consume_page() to end up at:

   p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074

All fixed now :-)

[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 613f050d68a8 ("perf probe: Fix to probe on gcc generated functions in modules")
Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/probe-event.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 84e7e698411e..a2670e9d652d 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -619,7 +619,7 @@ static int post_process_probe_trace_point(struct probe_trace_point *tp,
 					   struct map *map, unsigned long offs)
 {
 	struct symbol *sym;
-	u64 addr = tp->address + tp->offset - offs;
+	u64 addr = tp->address - offs;
 
 	sym = map__find_symbol(map, addr);
 	if (!sym)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [GIT PULL 0/1] perf probe fix
  2017-06-23  0:02 [GIT PULL 0/1] perf probe fix Arnaldo Carvalho de Melo
  2017-06-23  0:02 ` [PATCH 1/1] perf probe: Fix probe definition for inlined functions Arnaldo Carvalho de Melo
@ 2017-06-23  8:04 ` Ingo Molnar
  1 sibling, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2017-06-23  8:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Björn Töpel, Magnus Karlsson,
	Masami Hiramatsu, stable, Arnaldo Carvalho de Melo


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit fb3a5055cd7098f8d1dd0cd38d7172211113255f:
> 
>   perf/x86/intel: Add 1G DTLB load/store miss support for SKL (2017-06-22 11:07:08 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-urgent-for-mingo-4.12-20170622
> 
> for you to fetch changes up to 7598f8bc1383ffd77686cb4e92e749bef3c75937:
> 
>   perf probe: Fix probe definition for inlined functions (2017-06-22 16:08:09 -0300)
> 
> ----------------------------------------------------------------
> perf probe fix:
> 
> - Do not double the offset of inline expansions when using
>   'perf probe' on inlined functions (Björn Töpel)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Björn Töpel (1):
>       perf probe: Fix probe definition for inlined functions
> 
>  tools/perf/util/probe-event.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-06-23  8:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-23  0:02 [GIT PULL 0/1] perf probe fix Arnaldo Carvalho de Melo
2017-06-23  0:02 ` [PATCH 1/1] perf probe: Fix probe definition for inlined functions Arnaldo Carvalho de Melo
2017-06-23  8:04 ` [GIT PULL 0/1] perf probe fix Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox