linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf and containers
@ 2017-06-29 20:33 Brendan Gregg
  2017-06-30  2:16 ` Arnaldo Carvalho de Melo
  2017-06-30 17:13 ` Milian Wolff
  0 siblings, 2 replies; 5+ messages in thread
From: Brendan Gregg @ 2017-06-29 20:33 UTC (permalink / raw)
  To: linux-perf-use.

G'Day perf-users,

I've been using perf with containers a lot, running perf from the host
system-wide or with --cgroup for one container, and I'm wondering
about symbol translation and namespaces. When running "perf script" or
"perf report" from the host:

- kernel symbols: works fine

- JIT symbols: doesn't work as /tmp/perf-PID.map is in the container,
not the host, and has a different PID. I currently have shell scripts
to copy and rename map files, but... Would it be possible for perf to
try opening /tmp/perf-PID.map, and if that fails, check if the PID
still exists in /proc, read its NSpid from /proc/PID/status, and if
that's different (meaning it's in a container), then read
/proc/PID/root/tmp/perf-NSpid.map instead?

- binary symbols: /usr/bin/node,
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java, /lib/..., etc, may not
be in the host, so symbol translation fails. Could perf have a similar
approach where it tries from /proc/PID/root/... instead, if the PID is
in a container?

Or, are there other workarounds I don't know about for these? :) perf
could have a --setns flag to set the mount namespace before it
attempts symbol translation, which may also work for when I'm
analyzing one container only from the host, and could run "perf script
--setns=..." for that one container.

Using perf from within the container is totally different, and not
always possible (perf_event_open() can be disabled, and, some
containers are completely locked down, such that it's hard to run any
command that isn't the application)...

Brendan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: perf and containers
  2017-06-29 20:33 perf and containers Brendan Gregg
@ 2017-06-30  2:16 ` Arnaldo Carvalho de Melo
  2017-06-30  7:00   ` Thomas-Mich Richter
  2017-06-30 17:13 ` Milian Wolff
  1 sibling, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-06-30  2:16 UTC (permalink / raw)
  To: Brendan Gregg; +Cc: linux-perf-use.

Em Thu, Jun 29, 2017 at 01:33:37PM -0700, Brendan Gregg escreveu:
> G'Day perf-users,
> 
> I've been using perf with containers a lot, running perf from the host
> system-wide or with --cgroup for one container, and I'm wondering
> about symbol translation and namespaces. When running "perf script" or
> "perf report" from the host:
> 
> - kernel symbols: works fine
> 
> - JIT symbols: doesn't work as /tmp/perf-PID.map is in the container,
> not the host, and has a different PID. I currently have shell scripts
> to copy and rename map files, but... Would it be possible for perf to
> try opening /tmp/perf-PID.map, and if that fails, check if the PID
> still exists in /proc, read its NSpid from /proc/PID/status, and if
> that's different (meaning it's in a container), then read
> /proc/PID/root/tmp/perf-NSpid.map instead?
> 
> - binary symbols: /usr/bin/node,
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java, /lib/..., etc, may not
> be in the host, so symbol translation fails. Could perf have a similar
> approach where it tries from /proc/PID/root/... instead, if the PID is
> in a container?

Yeah, I watched your presentation :-)

We need to fix it properly, using the workarounds you describe as a
starting point, patches trying to attack this piecemeal would be more
than welcome, else I'll try to work on this at some point.

- Arnaldo
 
> Or, are there other workarounds I don't know about for these? :) perf
> could have a --setns flag to set the mount namespace before it
> attempts symbol translation, which may also work for when I'm
> analyzing one container only from the host, and could run "perf script
> --setns=..." for that one container.
> 
> Using perf from within the container is totally different, and not
> always possible (perf_event_open() can be disabled, and, some
> containers are completely locked down, such that it's hard to run any
> command that isn't the application)...
> 
> Brendan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: perf and containers
  2017-06-30  2:16 ` Arnaldo Carvalho de Melo
@ 2017-06-30  7:00   ` Thomas-Mich Richter
  2017-06-30 14:55     ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas-Mich Richter @ 2017-06-30  7:00 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Brendan Gregg
  Cc: linux-perf-use., Hendrik Brueckner

On 06/30/2017 04:16 AM, Arnaldo Carvalho de Melo wrote:

[...SNIP..]

I ran into a very similar issue when I tried the latest kernel 4.12 perf name space support.
I was able to find the new PERF_RECORD_NAMESPACE entries in the perf.data file 
but do not know how to interpret the ino-numbers shown in those entries.
Right now the wrong symbols will be displayed...

How to interpret the inode numbers the perf NAMESPACE event emits?

I am interested on to resolve this.

In my experiment there are 3 files involved: setup.sh, setup-app.sh and hello

setup.sh: prints its pid and invokes setup-app.sh via command unshare to get a new mount namespace.
setup-app.sh: prints its pid and replaces /bin/bash with hello via bind-mount and executes /bin/bash. (which in fact executes hello)
hello: prints out the inode number of its mount namespace.

The issue which shows up is detailed below. 
Finally program hello is executed using the name bash and the perf report tools has no way of knowing that
bash actually refers to hello in that mount name space and incorrect symbol names are displayed.

[root@s8360046 perf]# cat setup.sh
echo $0 pid $$
exec unshare -m  ./setup-app.sh

[root@s8360046 perf]# cat setup-app.sh
echo $0 pid $$ 
mount -B ./hello /bin/bash
exec /bin/bash

[root@s8360046 perf]# cat hello.c 
#include	<stdio.h>
#include	<unistd.h>
#include	<stdlib.h>
#include	<string.h>
#include	<sys/types.h>
#include	<sys/stat.h>

int main(int argc, char **argv)
{
	struct stat st;

	printf("%s hallo world pid:%d\n", *argv, getpid());
	if (stat("/proc/self/ns/mnt", &st) == 0) {
                printf("dev_id:%d ino:%ld\n", st.st_dev, st.st_ino);
        }
}


Now when you run the command 

[root@s8360046 perf]# ~/linux/tools/perf/perf record --namespace -- ./setup.sh
./setup.sh pid 26180
./setup-app.sh pid 26180
/bin/bash hallo world pid:26180
dev_id:3 ino:4026532057
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (114 samples) ]
[root@s8360046 perf]#

Now running 
[root@s8360046 perf]# ~/linux/tools/perf/perf report -D 
highlights the following events:

180378841576860 0x3348 [0x98]: PERF_RECORD_NAMESPACES 26180/26180 - nr_namespaces: 7
                [0/net: 3/0xf000002b, 1/uts: 3/0xf00000da, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc, 
                 4/user: 3/0xeffffffd, 5/mnt: 3/0xf00000d9, 6/cgroup: 3/0xeffffffb]

---> new mount namespace created 0xf00000d9 is 4026532057 in decimal as printed by program hello. This namespace was created with command unshare -m

180378879643837 0x2c78 [0x28]: PERF_RECORD_COMM exec: bash:26180/26180
--> new executable bash is started (which is actually hello)

180378879685931 0x2ca0 [0x68]: PERF_RECORD_MMAP2 26180/26180: [0x1000000(0x1000) @ 0 5e:01 813431 692173762]: r-xp /usr/bin/bash
--> new map command for hello, but it refers to bash (due to the mount bind).

I wonder on how to find out that the MMAP2 command for /usr/bin/bash actually refers to a totally different file, namely hello in the local directory.
How can the perf report tool find out which namespace to use for the MMAP2 event for /usr/bin/bash?

Thanks a lot.
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: perf and containers
  2017-06-30  7:00   ` Thomas-Mich Richter
@ 2017-06-30 14:55     ` David Ahern
  0 siblings, 0 replies; 5+ messages in thread
From: David Ahern @ 2017-06-30 14:55 UTC (permalink / raw)
  To: Thomas-Mich Richter, Arnaldo Carvalho de Melo, Brendan Gregg
  Cc: linux-perf-use., Hendrik Brueckner

On 6/30/17 1:00 AM, Thomas-Mich Richter wrote:
> On 06/30/2017 04:16 AM, Arnaldo Carvalho de Melo wrote:
> 
> [...SNIP..]
> 
> I ran into a very similar issue when I tried the latest kernel 4.12 perf name space support.
> I was able to find the new PERF_RECORD_NAMESPACE entries in the perf.data file 
> but do not know how to interpret the ino-numbers shown in those entries.
> Right now the wrong symbols will be displayed...
> 
> How to interpret the inode numbers the perf NAMESPACE event emits?

The device and inode number can be used as an identifier (e.g., combine
into a u64) for the namespace when processing events.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: perf and containers
  2017-06-29 20:33 perf and containers Brendan Gregg
  2017-06-30  2:16 ` Arnaldo Carvalho de Melo
@ 2017-06-30 17:13 ` Milian Wolff
  1 sibling, 0 replies; 5+ messages in thread
From: Milian Wolff @ 2017-06-30 17:13 UTC (permalink / raw)
  To: Brendan Gregg; +Cc: linux-perf-use.

On Donnerstag, 29. Juni 2017 22:33:37 CEST Brendan Gregg wrote:
> G'Day perf-users,
> 
> I've been using perf with containers a lot, running perf from the host
> system-wide or with --cgroup for one container, and I'm wondering
> about symbol translation and namespaces. When running "perf script" or
> "perf report" from the host:
> 
> - kernel symbols: works fine
> 
> - JIT symbols: doesn't work as /tmp/perf-PID.map is in the container,
> not the host, and has a different PID. I currently have shell scripts
> to copy and rename map files, but... Would it be possible for perf to
> try opening /tmp/perf-PID.map, and if that fails, check if the PID
> still exists in /proc, read its NSpid from /proc/PID/status, and if
> that's different (meaning it's in a container), then read
> /proc/PID/root/tmp/perf-NSpid.map instead?
> 
> - binary symbols: /usr/bin/node,
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java, /lib/..., etc, may not
> be in the host, so symbol translation fails. Could perf have a similar
> approach where it tries from /proc/PID/root/... instead, if the PID is
> in a container?
> 
> Or, are there other workarounds I don't know about for these? :) perf
> could have a --setns flag to set the mount namespace before it
> attempts symbol translation, which may also work for when I'm
> analyzing one container only from the host, and could run "perf script
> --setns=..." for that one container.
> 
> Using perf from within the container is totally different, and not
> always possible (perf_event_open() can be disabled, and, some
> containers are completely locked down, such that it's hard to run any
> command that isn't the application)...

It's a bit hacky, but in the past I resorted to passing the mount point of the 
container root to perf via `--symfs`. That made it work quite nicely for my 
use case, but I also recorded from within the container - not sure if that 
plays a role here.

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-06-30 17:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-29 20:33 perf and containers Brendan Gregg
2017-06-30  2:16 ` Arnaldo Carvalho de Melo
2017-06-30  7:00   ` Thomas-Mich Richter
2017-06-30 14:55     ` David Ahern
2017-06-30 17:13 ` Milian Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).