linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Clark Williams <williams@redhat.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Karl Rister <krister@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Brendan Gregg <brendan.d.gregg@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Krister Johansen <kjlx@templeofstupid.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Song Liu <songliubraving@fb.com>,
	Stanislav Fomichev <sdf@google.com>,
	Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Subject: [PATCH 35/37] perf evlist: Use unshare(CLONE_FS) in sb threads to let setns(CLONE_NEWNS) work
Date: Thu, 29 Aug 2019 11:39:15 -0300	[thread overview]
Message-ID: <20190829143917.29745-36-acme@kernel.org> (raw)
In-Reply-To: <20190829143917.29745-1-acme@kernel.org>

From: Arnaldo Carvalho de Melo <acme@redhat.com>

When we started using a thread to catch the PERF_RECORD_BPF_EVENT meta
data events to then ask the kernel for further info (BTF, etc) for BPF
programs shortly after they get loaded, we forgot to use
unshare(CLONE_FS) as was done in:

  868a832918f6 ("perf top: Support lookup of symbols in other mount namespaces.")

Do it so that we can enter the namespaces to read the build-ids at the
end of a 'perf record' session for the DSOs that had hits.

Before:

Starting a 'stress-ng --cpus 8' inside a container and then, outside the
container running:

  # perf record -a --namespaces sleep 5
  # perf buildid-list | grep stress-ng
  #

We would end up with a 'perf.data' file that had no entry in its
build-id table for the /usr/bin/stress-ng binary inside the container
that got tons of PERF_RECORD_SAMPLEs.

After:

  # perf buildid-list | grep stress-ng
  f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/bin/stress-ng
  #

Then its just a matter of making sure that that binary debuginfo package
gets available in a place that 'perf report' will look at build-id keyed
ELF files, which, in my case, on a f30 notebook, was a matter of
installing the debuginfo file for the distro used in the container,
fedora 31:

  # rpm -ivh http://fedora.c3sl.ufpr.br/linux/development/31/Everything/x86_64/debug/tree/Packages/s/stress-ng-debuginfo-0.07.29-10.fc31.x86_64.rpm

Then, because perf currently looks for those debuginfo files (richer ELF
symtab) inside that namespace (look at the setns calls):

  openat(AT_FDCWD, "/proc/self/ns/mnt", O_RDONLY) = 137
  openat(AT_FDCWD, "/proc/13169/ns/mnt", O_RDONLY) = 139
  setns(139, CLONE_NEWNS)                 = 0
  stat("/usr/bin/stress-ng", {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0
  openat(AT_FDCWD, "/usr/bin/stress-ng", O_RDONLY) = 140
  fcntl(140, F_GETFD)                     = 0
  fstat(140, {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0
  mmap(NULL, 3065416, PROT_READ, MAP_PRIVATE, 140, 0) = 0x7ff2fdc5b000
  munmap(0x7ff2fdc5b000, 3065416)         = 0
  close(140)                              = 0
  stat("stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
  stat("/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
  stat("/usr/bin/.debug/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
  stat("/usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
  stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29", 0x7fff45d711e0) = -1 ENOENT (No such file or directory)

To only then go back to the "host" namespace to look just in the users's
~/.debug cache:

  setns(137, CLONE_NEWNS)                 = 0
  chdir("/root")                          = 0
  close(137)                              = 0
  close(139)                              = 0
  stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf", 0x7fff45d732e0) = -1 ENOENT (No such file or directory)

It continues to fail to resolve symbols:

  # perf report | grep stress-ng | head -5
     9.50%  stress-ng-cpu    stress-ng    [.] 0x0000000000021ac1
     8.58%  stress-ng-cpu    stress-ng    [.] 0x0000000000021ab4
     8.51%  stress-ng-cpu    stress-ng    [.] 0x0000000000021489
     7.17%  stress-ng-cpu    stress-ng    [.] 0x00000000000219b6
     3.93%  stress-ng-cpu    stress-ng    [.] 0x0000000000021478
  #

To overcome that we use:

  # perf buildid-cache -v --add /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug
  Adding f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug: Ok
  #
  # ls -la /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
  -rw-r--r--. 3 root root 2401184 Jul 27 07:03 /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
  # file /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
  /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter \004, BuildID[sha1]=f2ed02c68341183a124b9b0f6e2e6c493c465b29, for GNU/Linux 3.2.0, with debug_info, not stripped, too many notes (256)
  #

Now it finally works:

  # perf report | grep stress-ng | head -5
    23.59%  stress-ng-cpu    stress-ng    [.] ackermann
    23.33%  stress-ng-cpu    stress-ng    [.] is_prime
    17.36%  stress-ng-cpu    stress-ng    [.] stress_cpu_sieve
     6.08%  stress-ng-cpu    stress-ng    [.] stress_cpu_correlate
     3.55%  stress-ng-cpu    stress-ng    [.] queens_try
  #

I'll make sure that it looks for the build-id keyed files in both the
"host" namespace (the namespace the user running 'perf record' was a the
time of the recording) and in the container namespace, as it shouldn't
matter where a content based key lookup finds the ELF file to use in
resolving symbols, etc.

Reported-by: Karl Rister <krister@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Fixes: 657ee5531903 ("perf evlist: Introduce side band thread")
Link: https://lkml.kernel.org/n/tip-g79k0jz41adiaeuqud742t2l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 5ad92fa72e78..253dd8dd0e12 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -21,6 +21,7 @@
 #include "bpf-event.h"
 #include <signal.h>
 #include <unistd.h>
+#include <sched.h>
 
 #include "parse-events.h"
 #include <subcmd/parse-options.h>
@@ -1824,6 +1825,14 @@ static void *perf_evlist__poll_thread(void *arg)
 	struct evlist *evlist = arg;
 	bool draining = false;
 	int i, done = 0;
+	/*
+	 * In order to read symbols from other namespaces perf to needs to call
+	 * setns(2).  This isn't permitted if the struct_fs has multiple users.
+	 * unshare(2) the fs so that we may continue to setns into namespaces
+	 * that we're observing when, for instance, reading the build-ids at
+	 * the end of a 'perf record' session.
+	 */
+	unshare(CLONE_FS);
 
 	while (!done) {
 		bool got_data = false;
-- 
2.21.0

  parent reply	other threads:[~2019-08-29 14:39 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-29 14:38 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 01/37] perf arch powerpc: Sync powerpc syscall.tbl Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 02/37] perf event: Check ref_reloc_sym before using it Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 03/37] perf tools: Use CAP_SYS_ADMIN with perf_event_paranoid checks Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 04/37] perf evsel: Kernel profiling is disallowed only when perf_event_paranoid > 1 Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 05/37] perf symbols: Use CAP_SYSLOG with kptr_restrict checks Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 06/37] perf tools: Warn that perf_event_paranoid can restrict kernel symbols Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 07/37] perf tools: Remove needless util.h include from builtin.h Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 08/37] perf evlist: Remove needless util.h from evlist.h Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 09/37] perf clang: Delete needless util-cxx.h header Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 10/37] perf top: Decay all events in the evlist Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 11/37] perf top: Fix event group with more than two events Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 12/37] libperf: Add PERF_RECORD_HEADER_ATTR 'struct attr_event' to perf/event.h Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 13/37] libperf: Add PERF_RECORD_CPU_MAP 'struct cpu_map_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 14/37] libperf: Add PERF_RECORD_EVENT_UPDATE 'struct event_update_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 15/37] libperf: Add PERF_RECORD_HEADER_EVENT_TYPE 'struct event_type_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 16/37] libperf: Add PERF_RECORD_HEADER_TRACING_DATA 'struct tracing_data_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 17/37] libperf: Add PERF_RECORD_HEADER_BUILD_ID 'struct build_id_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 18/37] libperf: Add PERF_RECORD_ID_INDEX 'struct id_index_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 19/37] libperf: Add PERF_RECORD_AUXTRACE_INFO 'struct auxtrace_info_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 20/37] libperf: Add PERF_RECORD_AUXTRACE 'struct auxtrace_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 21/37] libperf: Add PERF_RECORD_AUXTRACE_ERROR 'struct auxtrace_error_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 22/37] libperf: Add PERF_RECORD_AUX 'struct aux_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 23/37] libperf: Add PERF_RECORD_ITRACE_START 'struct itrace_start_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 24/37] libperf: Add PERF_RECORD_SWITCH 'struct context_switch_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 25/37] libperf: Add PERF_RECORD_THREAD_MAP 'struct thread_map_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 26/37] libperf: Add PERF_RECORD_STAT_CONFIG 'struct stat_config_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 27/37] libperf: Add PERF_RECORD_STAT 'struct stat_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 28/37] libperf: Add PERF_RECORD_STAT_ROUND 'struct stat_round_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 29/37] libperf: Add PERF_RECORD_TIME_CONV 'struct time_conv_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 30/37] libperf: Add PERF_RECORD_HEADER_FEATURE 'struct feature_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 31/37] libperf: Add PERF_RECORD_COMPRESSED 'struct compressed_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 32/37] libperf: Add 'union perf_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 33/37] libperf: Rename the PERF_RECORD_ structs to have a "perf" prefix Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 34/37] libperf: Move 'enum perf_user_event_type' to perf/event.h Arnaldo Carvalho de Melo
2019-08-29 14:39 ` Arnaldo Carvalho de Melo [this message]
2019-08-29 14:39 ` [PATCH 36/37] tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 37/37] tools lib traceevent: Remove unneeded qsort and uses memmove instead Arnaldo Carvalho de Melo
2019-08-29 18:58 ` [GIT PULL] perf/core improvements and fixes Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190829143917.29745-36-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ast@kernel.org \
    --cc=brendan.d.gregg@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=jolsa@kernel.org \
    --cc=kjlx@templeofstupid.com \
    --cc=krister@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sdf@google.com \
    --cc=songliubraving@fb.com \
    --cc=tglx@linutronix.de \
    --cc=tmricht@linux.vnet.ibm.com \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).