From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
Clark Williams <williams@redhat.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Karl Rister <krister@redhat.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Alexei Starovoitov <ast@kernel.org>,
Brendan Gregg <brendan.d.gregg@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Krister Johansen <kjlx@templeofstupid.com>,
Peter Zijlstra <peterz@infradead.org>,
Song Liu <songliubraving@fb.com>,
Stanislav Fomichev <sdf@google.com>,
Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Subject: [PATCH 35/37] perf evlist: Use unshare(CLONE_FS) in sb threads to let setns(CLONE_NEWNS) work
Date: Thu, 29 Aug 2019 11:39:15 -0300 [thread overview]
Message-ID: <20190829143917.29745-36-acme@kernel.org> (raw)
In-Reply-To: <20190829143917.29745-1-acme@kernel.org>
From: Arnaldo Carvalho de Melo <acme@redhat.com>
When we started using a thread to catch the PERF_RECORD_BPF_EVENT meta
data events to then ask the kernel for further info (BTF, etc) for BPF
programs shortly after they get loaded, we forgot to use
unshare(CLONE_FS) as was done in:
868a832918f6 ("perf top: Support lookup of symbols in other mount namespaces.")
Do it so that we can enter the namespaces to read the build-ids at the
end of a 'perf record' session for the DSOs that had hits.
Before:
Starting a 'stress-ng --cpus 8' inside a container and then, outside the
container running:
# perf record -a --namespaces sleep 5
# perf buildid-list | grep stress-ng
#
We would end up with a 'perf.data' file that had no entry in its
build-id table for the /usr/bin/stress-ng binary inside the container
that got tons of PERF_RECORD_SAMPLEs.
After:
# perf buildid-list | grep stress-ng
f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/bin/stress-ng
#
Then its just a matter of making sure that that binary debuginfo package
gets available in a place that 'perf report' will look at build-id keyed
ELF files, which, in my case, on a f30 notebook, was a matter of
installing the debuginfo file for the distro used in the container,
fedora 31:
# rpm -ivh http://fedora.c3sl.ufpr.br/linux/development/31/Everything/x86_64/debug/tree/Packages/s/stress-ng-debuginfo-0.07.29-10.fc31.x86_64.rpm
Then, because perf currently looks for those debuginfo files (richer ELF
symtab) inside that namespace (look at the setns calls):
openat(AT_FDCWD, "/proc/self/ns/mnt", O_RDONLY) = 137
openat(AT_FDCWD, "/proc/13169/ns/mnt", O_RDONLY) = 139
setns(139, CLONE_NEWNS) = 0
stat("/usr/bin/stress-ng", {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0
openat(AT_FDCWD, "/usr/bin/stress-ng", O_RDONLY) = 140
fcntl(140, F_GETFD) = 0
fstat(140, {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0
mmap(NULL, 3065416, PROT_READ, MAP_PRIVATE, 140, 0) = 0x7ff2fdc5b000
munmap(0x7ff2fdc5b000, 3065416) = 0
close(140) = 0
stat("stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
stat("/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
stat("/usr/bin/.debug/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
stat("/usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29", 0x7fff45d711e0) = -1 ENOENT (No such file or directory)
To only then go back to the "host" namespace to look just in the users's
~/.debug cache:
setns(137, CLONE_NEWNS) = 0
chdir("/root") = 0
close(137) = 0
close(139) = 0
stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf", 0x7fff45d732e0) = -1 ENOENT (No such file or directory)
It continues to fail to resolve symbols:
# perf report | grep stress-ng | head -5
9.50% stress-ng-cpu stress-ng [.] 0x0000000000021ac1
8.58% stress-ng-cpu stress-ng [.] 0x0000000000021ab4
8.51% stress-ng-cpu stress-ng [.] 0x0000000000021489
7.17% stress-ng-cpu stress-ng [.] 0x00000000000219b6
3.93% stress-ng-cpu stress-ng [.] 0x0000000000021478
#
To overcome that we use:
# perf buildid-cache -v --add /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug
Adding f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug: Ok
#
# ls -la /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
-rw-r--r--. 3 root root 2401184 Jul 27 07:03 /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
# file /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter \004, BuildID[sha1]=f2ed02c68341183a124b9b0f6e2e6c493c465b29, for GNU/Linux 3.2.0, with debug_info, not stripped, too many notes (256)
#
Now it finally works:
# perf report | grep stress-ng | head -5
23.59% stress-ng-cpu stress-ng [.] ackermann
23.33% stress-ng-cpu stress-ng [.] is_prime
17.36% stress-ng-cpu stress-ng [.] stress_cpu_sieve
6.08% stress-ng-cpu stress-ng [.] stress_cpu_correlate
3.55% stress-ng-cpu stress-ng [.] queens_try
#
I'll make sure that it looks for the build-id keyed files in both the
"host" namespace (the namespace the user running 'perf record' was a the
time of the recording) and in the container namespace, as it shouldn't
matter where a content based key lookup finds the ELF file to use in
resolving symbols, etc.
Reported-by: Karl Rister <krister@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Fixes: 657ee5531903 ("perf evlist: Introduce side band thread")
Link: https://lkml.kernel.org/n/tip-g79k0jz41adiaeuqud742t2l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/evlist.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 5ad92fa72e78..253dd8dd0e12 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -21,6 +21,7 @@
#include "bpf-event.h"
#include <signal.h>
#include <unistd.h>
+#include <sched.h>
#include "parse-events.h"
#include <subcmd/parse-options.h>
@@ -1824,6 +1825,14 @@ static void *perf_evlist__poll_thread(void *arg)
struct evlist *evlist = arg;
bool draining = false;
int i, done = 0;
+ /*
+ * In order to read symbols from other namespaces perf to needs to call
+ * setns(2). This isn't permitted if the struct_fs has multiple users.
+ * unshare(2) the fs so that we may continue to setns into namespaces
+ * that we're observing when, for instance, reading the build-ids at
+ * the end of a 'perf record' session.
+ */
+ unshare(CLONE_FS);
while (!done) {
bool got_data = false;
--
2.21.0
next prev parent reply other threads:[~2019-08-29 14:39 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-29 14:38 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 01/37] perf arch powerpc: Sync powerpc syscall.tbl Arnaldo Carvalho de Melo
2019-08-29 14:38 ` Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 02/37] perf event: Check ref_reloc_sym before using it Arnaldo Carvalho de Melo
2019-08-29 14:38 ` Arnaldo Carvalho de Melo
2019-08-29 14:38 ` Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 03/37] perf tools: Use CAP_SYS_ADMIN with perf_event_paranoid checks Arnaldo Carvalho de Melo
2019-08-29 14:38 ` Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 04/37] perf evsel: Kernel profiling is disallowed only when perf_event_paranoid > 1 Arnaldo Carvalho de Melo
2019-08-29 14:38 ` Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 05/37] perf symbols: Use CAP_SYSLOG with kptr_restrict checks Arnaldo Carvalho de Melo
2019-08-29 14:38 ` Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 06/37] perf tools: Warn that perf_event_paranoid can restrict kernel symbols Arnaldo Carvalho de Melo
2019-08-29 14:38 ` Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 07/37] perf tools: Remove needless util.h include from builtin.h Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 08/37] perf evlist: Remove needless util.h from evlist.h Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 09/37] perf clang: Delete needless util-cxx.h header Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 10/37] perf top: Decay all events in the evlist Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 11/37] perf top: Fix event group with more than two events Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 12/37] libperf: Add PERF_RECORD_HEADER_ATTR 'struct attr_event' to perf/event.h Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 13/37] libperf: Add PERF_RECORD_CPU_MAP 'struct cpu_map_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 14/37] libperf: Add PERF_RECORD_EVENT_UPDATE 'struct event_update_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 15/37] libperf: Add PERF_RECORD_HEADER_EVENT_TYPE 'struct event_type_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 16/37] libperf: Add PERF_RECORD_HEADER_TRACING_DATA 'struct tracing_data_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 17/37] libperf: Add PERF_RECORD_HEADER_BUILD_ID 'struct build_id_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 18/37] libperf: Add PERF_RECORD_ID_INDEX 'struct id_index_event' " Arnaldo Carvalho de Melo
2019-08-29 14:38 ` [PATCH 19/37] libperf: Add PERF_RECORD_AUXTRACE_INFO 'struct auxtrace_info_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 20/37] libperf: Add PERF_RECORD_AUXTRACE 'struct auxtrace_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 21/37] libperf: Add PERF_RECORD_AUXTRACE_ERROR 'struct auxtrace_error_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 22/37] libperf: Add PERF_RECORD_AUX 'struct aux_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 23/37] libperf: Add PERF_RECORD_ITRACE_START 'struct itrace_start_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 24/37] libperf: Add PERF_RECORD_SWITCH 'struct context_switch_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 25/37] libperf: Add PERF_RECORD_THREAD_MAP 'struct thread_map_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 26/37] libperf: Add PERF_RECORD_STAT_CONFIG 'struct stat_config_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 27/37] libperf: Add PERF_RECORD_STAT 'struct stat_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 28/37] libperf: Add PERF_RECORD_STAT_ROUND 'struct stat_round_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 29/37] libperf: Add PERF_RECORD_TIME_CONV 'struct time_conv_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 30/37] libperf: Add PERF_RECORD_HEADER_FEATURE 'struct feature_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 31/37] libperf: Add PERF_RECORD_COMPRESSED 'struct compressed_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 32/37] libperf: Add 'union perf_event' " Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 33/37] libperf: Rename the PERF_RECORD_ structs to have a "perf" prefix Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 34/37] libperf: Move 'enum perf_user_event_type' to perf/event.h Arnaldo Carvalho de Melo
2019-08-29 14:39 ` Arnaldo Carvalho de Melo [this message]
2019-08-29 14:39 ` [PATCH 36/37] tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure Arnaldo Carvalho de Melo
2019-08-29 14:39 ` [PATCH 37/37] tools lib traceevent: Remove unneeded qsort and uses memmove instead Arnaldo Carvalho de Melo
2019-08-29 18:58 ` [GIT PULL] perf/core improvements and fixes Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190829143917.29745-36-acme@kernel.org \
--to=acme@kernel.org \
--cc=acme@redhat.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=ast@kernel.org \
--cc=brendan.d.gregg@gmail.com \
--cc=daniel@iogearbox.net \
--cc=jolsa@kernel.org \
--cc=kjlx@templeofstupid.com \
--cc=krister@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=sdf@google.com \
--cc=songliubraving@fb.com \
--cc=tglx@linutronix.de \
--cc=tmricht@linux.vnet.ibm.com \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.