* Re: Perf support for interpreted and Just-In-Time translated languages @ 2014-12-05 20:18 Carl Love 2014-12-05 21:27 ` Brendan Gregg 0 siblings, 1 reply; 29+ messages in thread From: Carl Love @ 2014-12-05 20:18 UTC (permalink / raw) To: linux-perf-users > On 12/02/2014 08:36 PM, Brendan Gregg wrote: >> G'Day Will, >> >> On Tue, Dec 2, 2014 at 1:08 PM, William Cohen <wcohen@redhat.com> wrote: >>> perf makes use of the debug information provided by the compilers to >>> map the addresses observed in the instruction pointer and on the stack >>> back to source code. This works very well for traditional compiled >>> programs written in c and c++. However, the assumption that the >>> instruction address maps back to something the user wrote is not true >>> for code written in interpretered languages such as python, perl, and >>> Ruby or for Just-In-Time (JIT) runtime environment commonly used for >>> Java. The addresses would either map back to the interpreter runtime >>> or dynamically generated code. It would be really nice if perf was >>> enhanced to provide data about where in the interpreted and JIT'ed >>> code the processor was spending time. I wholeheartedly agree. The ability to profile Java JITed code is a very big deal for some perf users. I think perf should provide its own solution for profiling Java JITed code that is well designed and well documented, instead of directing users to something out-of-tree and out of perf's sphere of control. >> >> perf supports the /tmp/perf-PID.map files for JIT translations. It's >> up to the runtimes to create these files. >> >> I was enhancing the Java perf-map-agent today >> (https://github.com/jrudolph/perf-map-agent), and using it with perf. Thanks for the pointer. I didn't know about this tool before. It's cool that it has the ability to attach to a running JVM and create a /tmp/perf-<pid>.map file -- i.e., can capture profile data without having to start the JVM with the -agentpath or -agentlib option. But the downside is (as the documentation says) ... "Over time the JVM will JIT compile more methods and the perf-<pid>.map file will become stale. You need to rerun perf-java to generate a new and current map." >> perf doesn't seem to handle map files that grow (and overwrite >> symbols) very well, so I had to create an extra step that cleaned up >> the map file. I should write up the Java instructions somewhere. Yes, oprofile has to handle that as well. It keeps track of how long each symbol resides at the overwritten address, and then chooses the one that was resident the longest to attribute samples to. It's of course not perfect, but it's probably reasonable to do so. The oprofile user manual explains this (http://oprofile.sourceforge.net/doc/overlapping-symbols.html). >> >> I did do a writeup for Node.js, whose v8 engine supports the perf map >> files. See: http://www.brendangregg.com/blog/2014-09-17/node-flame-graphs-on-linux.html >> >> Also see tools/perf/Documentation/jit-interface.txt >> >>> OProfile provides the ability to map samples from Java Runtime >>> Environment (JRE) JIT code using a shared library agent loaded when >>> the program starts executing. The shared library uses the JVMTI or >>> JVMPI interface to note the method that each region of JIT'ed code >>> maps to. This is later used to map the instruction pointer back to >>> the appropriate Java method. There is some information on how this is >>> implement at http://oprofile.sourceforge.net/doc/devel/index.html. >> >> Yes, that's exactly what perf-map-agent does (JVMTI). I only just Similar, but not exactly. OProfile's Java agent library is passed to the JVM on startup and is continuously used throughout the JVM's run time. It would be ideal to have both this functionality and the attach functionality of perf-map-agent. > OProfile provides two implementations of VM-specific libs -- one for pre-1.5 Java > (using JVMPI interface) and another for 1.5 and later Java (using JVMTI interface). > I know there are some other VM-specific agent libs that have been written (for mono > and LLVM), but don't know how much they are used -- they were not contributed to > oprofile. >> created the pull request, but if you try perf-map-agent, you'll want >> to use the fflush fix to avoid buffering lag >> (https://github.com/jrudolph/perf-map-agent/pull/8). There are a couple other issues with the current techniques used by perf for profiling JITed code (unless I'm missing something): - When are the /tmp/perf-<pid>.map files deleted? - How does this work for the offline analysis scenario (i.e., using 'perf archive')? Would the /tmp/perf-<pid>.map files have to be copied over to the host system where the analysis is being done? Carl Love ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated languages 2014-12-05 20:18 Perf support for interpreted and Just-In-Time translated languages Carl Love @ 2014-12-05 21:27 ` Brendan Gregg 2014-12-09 20:34 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 29+ messages in thread From: Brendan Gregg @ 2014-12-05 21:27 UTC (permalink / raw) To: Carl Love; +Cc: linux-perf-use. G'Day Carl, On Fri, Dec 5, 2014 at 12:18 PM, Carl Love <cel@us.ibm.com> wrote: > > > On 12/02/2014 08:36 PM, Brendan Gregg wrote: > >> G'Day Will, > >> > >> On Tue, Dec 2, 2014 at 1:08 PM, William Cohen <wcohen@redhat.com> wrote: > >>> perf makes use of the debug information provided by the compilers to > >>> map the addresses observed in the instruction pointer and on the stack > >>> back to source code. This works very well for traditional compiled > >>> programs written in c and c++. However, the assumption that the > >>> instruction address maps back to something the user wrote is not true > >>> for code written in interpretered languages such as python, perl, and > >>> Ruby or for Just-In-Time (JIT) runtime environment commonly used for > >>> Java. The addresses would either map back to the interpreter runtime > >>> or dynamically generated code. It would be really nice if perf was > >>> enhanced to provide data about where in the interpreted and JIT'ed > >>> code the processor was spending time. > > I wholeheartedly agree. The ability to profile Java JITed code is a very big > deal for some perf users. I think perf should provide its own solution for > profiling Java JITed code that is well designed and well documented, instead of > directing users to something out-of-tree and out of perf's sphere of control. I posted a hotspot patch yesterday: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016477.html Along with perf-map-agent for symbol translation, this lets perf profile Java (noting the caveats in that email). ... There are a lot of exotic things we can do with perf, but I don't think CPU stack profiling is one of them. I think if you have perf & java, it should just work. > >> > >> perf supports the /tmp/perf-PID.map files for JIT translations. It's > >> up to the runtimes to create these files. > >> > >> I was enhancing the Java perf-map-agent today > >> (https://github.com/jrudolph/perf-map-agent), and using it with perf. > > Thanks for the pointer. I didn't know about this tool before. It's cool that > it has the ability to attach to a running JVM and create a /tmp/perf-<pid>.map > file -- i.e., can capture profile data without having to start the JVM with the > -agentpath or -agentlib option. But the downside is (as the documentation says) ... > "Over time the JVM will JIT compile more methods and the perf-<pid>.map file > will become stale. You need to rerun perf-java to generate a new and current map." FWIW, there's a lot of churn in the first few minutes of java running hot, as methods get compiled, but I've seen it settle down after 5 minutes for my workload. Still, it's something I'm keeping an eye on. I can, at least, generate a map before and after profiling, and look for changes. > > > >> perf doesn't seem to handle map files that grow (and overwrite > >> symbols) very well, so I had to create an extra step that cleaned up > >> the map file. I should write up the Java instructions somewhere. > > Yes, oprofile has to handle that as well. It keeps track of how long > each symbol resides at the overwritten address, and then chooses the > one that was resident the longest to attribute samples to. It's of course not > perfect, but it's probably reasonable to do so. The oprofile user manual > explains this (http://oprofile.sourceforge.net/doc/overlapping-symbols.html). Hm, that is a bit odd. My dumb solution would have been to detect symbols that have changed during profiling, and flag them in the profile so the end-user would know to be dubious. The percentage is pretty small, but YMMV. If we were to look at timing, why not have JVMTI emit timestamped method symbols, and then correlate to perf's timestamped samples. > > >> > >> I did do a writeup for Node.js, whose v8 engine supports the perf map > >> files. See: http://www.brendangregg.com/blog/2014-09-17/node-flame-graphs-on-linux.html > >> > >> Also see tools/perf/Documentation/jit-interface.txt > >> > >>> OProfile provides the ability to map samples from Java Runtime > >>> Environment (JRE) JIT code using a shared library agent loaded when > >>> the program starts executing. The shared library uses the JVMTI or > >>> JVMPI interface to note the method that each region of JIT'ed code > >>> maps to. This is later used to map the instruction pointer back to > >>> the appropriate Java method. There is some information on how this is > >>> implement at http://oprofile.sourceforge.net/doc/devel/index.html. > >> > >> Yes, that's exactly what perf-map-agent does (JVMTI). I only just > > Similar, but not exactly. OProfile's Java agent library is passed to the JVM > on startup and is continuously used throughout the JVM's run time. It would be > ideal to have both this functionality and the attach functionality of perf-map-agent. Actually, that is what perf-map-agent did do when I wrote this. :) It was just changed, so that it now emits the map file on demand. A motivating factor to change this was that the map file grew in such a way that it confused perf_events, which didn't translate properly. I haven't debugged it, but I suspect perf_events expects a sorted map file, which this wasn't. I wrote a perl tool to tidy up the map file, which made perf_events then work correctly.. The other solution, which is what perf-map-agent now does, is just to dump the whole map file on demand, rather than growing it over time. > > > OProfile provides two implementations of VM-specific libs -- one for pre-1.5 Java > > (using JVMPI interface) and another for 1.5 and later Java (using JVMTI interface). > > I know there are some other VM-specific agent libs that have been written (for mono > > and LLVM), but don't know how much they are used -- they were not contributed to > > oprofile. > >> created the pull request, but if you try perf-map-agent, you'll want > >> to use the fflush fix to avoid buffering lag > >> (https://github.com/jrudolph/perf-map-agent/pull/8). > > There are a couple other issues with the current techniques used by perf for profiling > JITed code (unless I'm missing something): > - When are the /tmp/perf-<pid>.map files deleted? That's up to the runtime agent. Currently never, so your /tmp slowly fills! > - How does this work for the offline analysis scenario (i.e., using 'perf archive')? > Would the /tmp/perf-<pid>.map files have to be copied over to the host system where > the analysis is being done? Yes. I keep copies of the perf.map along with the perf.data. It might be worth having an option to perf to change the base path for these maps, so that I didn't have to keep putting them in /tmp. Brendan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated languages 2014-12-05 21:27 ` Brendan Gregg @ 2014-12-09 20:34 ` Arnaldo Carvalho de Melo 2014-12-09 22:01 ` Andi Kleen 2014-12-10 7:55 ` Perf support for interpreted and Just-In-Time translated languages Pekka Enberg 0 siblings, 2 replies; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2014-12-09 20:34 UTC (permalink / raw) To: Brendan Gregg; +Cc: Carl Love, Pekka Enberg, linux-perf-use. Em Fri, Dec 05, 2014 at 01:27:36PM -0800, Brendan Gregg escreveu: > G'Day Carl, > > On Fri, Dec 5, 2014 at 12:18 PM, Carl Love <cel@us.ibm.com> wrote: > > > > > On 12/02/2014 08:36 PM, Brendan Gregg wrote: > > >> G'Day Will, > > >> > > >> On Tue, Dec 2, 2014 at 1:08 PM, William Cohen <wcohen@redhat.com> wrote: > > >>> perf makes use of the debug information provided by the compilers to > > >>> map the addresses observed in the instruction pointer and on the stack > > >>> back to source code. This works very well for traditional compiled > > >>> programs written in c and c++. However, the assumption that the > > >>> instruction address maps back to something the user wrote is not true > > >>> for code written in interpretered languages such as python, perl, and > > >>> Ruby or for Just-In-Time (JIT) runtime environment commonly used for > > >>> Java. The addresses would either map back to the interpreter runtime > > >>> or dynamically generated code. It would be really nice if perf was > > >>> enhanced to provide data about where in the interpreted and JIT'ed > > >>> code the processor was spending time. > > > > I wholeheartedly agree. The ability to profile Java JITed code is a very big > > deal for some perf users. I think perf should provide its own solution for > > profiling Java JITed code that is well designed and well documented, instead of > > directing users to something out-of-tree and out of perf's sphere of control. > > I posted a hotspot patch yesterday: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016477.html > > Along with perf-map-agent for symbol translation, this lets perf > profile Java (noting the caveats in that email). > > ... There are a lot of exotic things we can do with perf, but I don't > think CPU stack profiling is one of them. I think if you have perf & > java, it should just work. > > > >> > > >> perf supports the /tmp/perf-PID.map files for JIT translations. It's > > >> up to the runtimes to create these files. > > >> > > >> I was enhancing the Java perf-map-agent today > > >> (https://github.com/jrudolph/perf-map-agent), and using it with perf. > > > > Thanks for the pointer. I didn't know about this tool before. It's cool that > > it has the ability to attach to a running JVM and create a /tmp/perf-<pid>.map > > file -- i.e., can capture profile data without having to start the JVM with the > > -agentpath or -agentlib option. But the downside is (as the documentation says) ... > > "Over time the JVM will JIT compile more methods and the perf-<pid>.map file > > will become stale. You need to rerun perf-java to generate a new and current map." > > FWIW, there's a lot of churn in the first few minutes of java running > hot, as methods get compiled, but I've seen it settle down after 5 > minutes for my workload. Still, it's something I'm keeping an eye on. > I can, at least, generate a map before and after profiling, and look > for changes. Humm, I wonder if we could try to attach a 'perf probe' (uprobes) to some JVM method that is known to invalidate JITted code -> symtab mappings so that we would use it as a PERF_RECORD_MMAP equivalent... I.e. we would know that that map overlaps the previous one and that the symtab is a new one for that addr range, etc, just like we do for executable mmaps coming from the kernel (PERF_RECORD_MMAP). > > >> perf doesn't seem to handle map files that grow (and overwrite > > >> symbols) very well, so I had to create an extra step that cleaned up > > >> the map file. I should write up the Java instructions somewhere. > > > > Yes, oprofile has to handle that as well. It keeps track of how long > > each symbol resides at the overwritten address, and then chooses the > > one that was resident the longest to attribute samples to. It's of course not > > perfect, but it's probably reasonable to do so. The oprofile user manual > > explains this (http://oprofile.sourceforge.net/doc/overlapping-symbols.html). > Hm, that is a bit odd. My dumb solution would have been to detect > symbols that have changed during profiling, and flag them in the > profile so the end-user would know to be dubious. The percentage is > pretty small, but YMMV. > If we were to look at timing, why not have JVMTI emit timestamped > method symbols, and then correlate to perf's timestamped samples. I think we need just to intercept mmap reuses, somehow... I wonder if this is not a dtrace tracepoint (or whatever that may be named in dtrace land). > > >> I did do a writeup for Node.js, whose v8 engine supports the perf map > > >> files. See: http://www.brendangregg.com/blog/2014-09-17/node-flame-graphs-on-linux.html > > >> > > >> Also see tools/perf/Documentation/jit-interface.txt > > >> > > >>> OProfile provides the ability to map samples from Java Runtime > > >>> Environment (JRE) JIT code using a shared library agent loaded when > > >>> the program starts executing. The shared library uses the JVMTI or > > >>> JVMPI interface to note the method that each region of JIT'ed code > > >>> maps to. This is later used to map the instruction pointer back to > > >>> the appropriate Java method. There is some information on how this is > > >>> implement at http://oprofile.sourceforge.net/doc/devel/index.html. > > >> > > >> Yes, that's exactly what perf-map-agent does (JVMTI). I only just > > > > Similar, but not exactly. OProfile's Java agent library is passed to the JVM > > on startup and is continuously used throughout the JVM's run time. It would be > > ideal to have both this functionality and the attach functionality of perf-map-agent. > > Actually, that is what perf-map-agent did do when I wrote this. :) It > was just changed, so that it now emits the map file on demand. > > A motivating factor to change this was that the map file grew in such > a way that it confused perf_events, which didn't translate properly. I > haven't debugged it, but I suspect perf_events expects a sorted map > file, which this wasn't. I wrote a perl tool to tidy up the map file, It shouldn't, as it goes on reading and adding it to a rbtree, which sorts the symbols so that later we can lookup by addr. > which made perf_events then work correctly.. The other solution, which > is what perf-map-agent now does, is just to dump the whole map file on > demand, rather than growing it over time. > > > OProfile provides two implementations of VM-specific libs -- one for pre-1.5 Java > > > (using JVMPI interface) and another for 1.5 and later Java (using JVMTI interface). > > > I know there are some other VM-specific agent libs that have been written (for mono > > > and LLVM), but don't know how much they are used -- they were not contributed to > > > oprofile. > > >> created the pull request, but if you try perf-map-agent, you'll want > > >> to use the fflush fix to avoid buffering lag > > >> (https://github.com/jrudolph/perf-map-agent/pull/8). > > > > There are a couple other issues with the current techniques used by perf for profiling > > JITed code (unless I'm missing something): > > - When are the /tmp/perf-<pid>.map files deleted? > > That's up to the runtime agent. Currently never, so your /tmp slowly fills! > > > - How does this work for the offline analysis scenario (i.e., using 'perf archive')? > > Would the /tmp/perf-<pid>.map files have to be copied over to the host system where > > the analysis is being done? > > Yes. I keep copies of the perf.map along with the perf.data. It might > be worth having an option to perf to change the base path for these > maps, so that I didn't have to keep putting them in /tmp. Right, this was not really designed, was just a proof of concept for JATO needs, right Pekka? - Arnaldo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated languages 2014-12-09 20:34 ` Arnaldo Carvalho de Melo @ 2014-12-09 22:01 ` Andi Kleen 2014-12-09 22:22 ` Perf support for interpreted and Just-In-Time translated olanguages Arnaldo Carvalho de Melo 2014-12-10 7:55 ` Perf support for interpreted and Just-In-Time translated languages Pekka Enberg 1 sibling, 1 reply; 29+ messages in thread From: Andi Kleen @ 2014-12-09 22:01 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. Arnaldo Carvalho de Melo <acme@kernel.org> writes: > > Humm, I wonder if we could try to attach a 'perf probe' (uprobes) to > some JVM method that is known to invalidate JITted code -> symtab > mappings so that we would use it as a PERF_RECORD_MMAP equivalent... > I.e. we would know that that map overlaps the previous one and that the > symtab is a new one for that addr range, etc, just like we do for > executable mmaps coming from the kernel (PERF_RECORD_MMAP). JAVA already has a API to get all these information. That is what oprofile, Vtune and Brendan's agent uses. It just needs a better interface from the agent to perf, to pass all needed information, including symbols, line numbers, executable code (for PT decoding and for showing diassembler), and ordering it by time so that no hacks are needed. BTW other JITs (LLVM, Mono, V8, ...) have similar interfaces. Longer term as the kernel gets more JITed (eBPF etc.) it likely needs some kind of JIT interface too. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-09 22:01 ` Andi Kleen @ 2014-12-09 22:22 ` Arnaldo Carvalho de Melo 2014-12-10 0:38 ` Andi Kleen 2014-12-10 17:32 ` Andi Kleen 0 siblings, 2 replies; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2014-12-09 22:22 UTC (permalink / raw) To: Andi Kleen; +Cc: Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. Em Tue, Dec 09, 2014 at 02:01:11PM -0800, Andi Kleen escreveu: > Arnaldo Carvalho de Melo <acme@kernel.org> writes: > > > > Humm, I wonder if we could try to attach a 'perf probe' (uprobes) to > > some JVM method that is known to invalidate JITted code -> symtab > > mappings so that we would use it as a PERF_RECORD_MMAP equivalent... > > I.e. we would know that that map overlaps the previous one and that the > > symtab is a new one for that addr range, etc, just like we do for > > executable mmaps coming from the kernel (PERF_RECORD_MMAP). > > JAVA already has a API to get all these information. That is > what oprofile, Vtune and Brendan's agent uses. I understood that there is a way to ask for the current JITted code -> symtab, my question was specifically about how to get notifications when those mappings change. The described solutions states that those maps can get stale, i.e. they will change and we don't get a notification for that. When we use mmap(addr, len, PROT_EXEC) the kernel has a meta event called PERF_RECORD_MMAP that will record addr range, symtab DSO path, and we ask it to be timestamped, how to do that for the equivalent part in the JVM? My initial thought was to find that using perf probe and insert there a probe point, but I think that there may be already an existing tracepoint in the jvm for that, one that, from what I've read so far, is _not_ being used by this java perf agent, right? From what I understood, how would it insert that event into the perf.data event stream? Only if it necessarily involved a new mmap, via the kernel, etc. > It just needs a better interface from the agent to perf, to pass all > needed information, including symbols, line numbers, executable code > (for PT decoding and for showing diassembler), and ordering it by time > so that no hacks are needed. > > BTW other JITs (LLVM, Mono, V8, ...) have similar interfaces. > > Longer term as the kernel gets more JITed (eBPF etc.) it likely needs > some kind of JIT interface too. Right, this is something we need to have, no questions about it, its just a matter of cooking up some prototype implementation... If, for instance, the java agent would put on some file those events, timestamped, then when in perf report we would just insert them into the event stream as synthesized PERF_RECORD_MMAPs, probably that would be enough. > -Andi > > -- > ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-09 22:22 ` Perf support for interpreted and Just-In-Time translated olanguages Arnaldo Carvalho de Melo @ 2014-12-10 0:38 ` Andi Kleen 2014-12-10 17:41 ` Carl Love 2014-12-10 19:19 ` Arnaldo Carvalho de Melo 2014-12-10 17:32 ` Andi Kleen 1 sibling, 2 replies; 29+ messages in thread From: Andi Kleen @ 2014-12-10 0:38 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. Arnaldo Carvalho de Melo <acme@kernel.org> writes: > When we use mmap(addr, len, PROT_EXEC) the kernel has a meta event > called PERF_RECORD_MMAP that will record addr range, symtab DSO path, > and we ask it to be timestamped, how to do that for the equivalent part > in the JVM? http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html You just need a way to push it from the agent to perf. > > My initial thought was to find that using perf probe and insert there a > probe point, but I think that there may be already an existing > tracepoint in the jvm for that, one that, from what I've read so far, > is _not_ being used by this java perf agent, right? The agent interface works by linking an agent with a special library I believe. So the agent can do whatever it wants. > From what I understood, how would it insert that event into the > perf.data event stream? Only if it necessarily involved a new mmap, via > the kernel, etc. You basically need a way to trigger a perf event from user space. One simple way would be to just write a perf.data too into /tmp, and let perf collect that. But it's more than that. You also need to transfer the executable code somewhere, and the line numbers. > >> It just needs a better interface from the agent to perf, to pass all >> needed information, including symbols, line numbers, executable code >> (for PT decoding and for showing diassembler), and ordering it by time >> so that no hacks are needed. >> >> BTW other JITs (LLVM, Mono, V8, ...) have similar interfaces. >> >> Longer term as the kernel gets more JITed (eBPF etc.) it likely needs >> some kind of JIT interface too. > > Right, this is something we need to have, no questions about it, its > just a matter of cooking up some prototype implementation... > > If, for instance, the java agent would put on some file those events, > timestamped, then when in perf report we would just insert them into the > event stream as synthesized PERF_RECORD_MMAPs, probably that would be > enough. Really to be useful you need at least line numbers, better code. In theory the agent could write ELF files with dwarf, but that may get really ugly. Probably better to have something more light weight. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 0:38 ` Andi Kleen @ 2014-12-10 17:41 ` Carl Love 2014-12-10 18:09 ` Andi Kleen 2014-12-10 19:21 ` Arnaldo Carvalho de Melo 2014-12-10 19:19 ` Arnaldo Carvalho de Melo 1 sibling, 2 replies; 29+ messages in thread From: Carl Love @ 2014-12-10 17:41 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Andi Kleen, Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. On Tue, 2014-12-09 at 16:38 -0800, Andi Kleen wrote: > Arnaldo Carvalho de Melo <acme@kernel.org> writes: > > > When we use mmap(addr, len, PROT_EXEC) the kernel has a meta event > > called PERF_RECORD_MMAP that will record addr range, symtab DSO path, > > and we ask it to be timestamped, how to do that for the equivalent part > > in the JVM? > > http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html > > You just need a way to push it from the agent to perf. > This may be more complex then needed. We really just need to track the symbol. From talking with the Java experts, they say the reused memory is almost always from methods that are loaded/used early when initializing the program. Then the memory gets reused by a method that runs for the bulk of the time. So by getting a timestamp for when the symbol was loaded you can then determine the length of time each symbol was in the anonymous memory. Then just associate the sample with the symbol that was present in the anonymous memory the longest. You can also tag the symbol names to say that they reside in the same memory, ie, the memory was reused. To warn the user that there maybe some ambiguity in the sample to symbol association. > > > > My initial thought was to find that using perf probe and insert there a > > probe point, but I think that there may be already an existing > > tracepoint in the jvm for that, one that, from what I've read so far, > > is _not_ being used by this java perf agent, right? > > The agent interface works by linking an agent with a special library > I believe. So the agent can do whatever it wants. > > From what I understood, how would it insert that event into the > > perf.data event stream? Only if it necessarily involved a new mmap, via > > the kernel, etc. > > You basically need a way to trigger a perf event from user space. > One simple way would be to just write a perf.data too into /tmp, > and let perf collect that. > > But it's more than that. You also need to transfer the executable > code somewhere, and the line numbers. > The source file and line number information is optionally available from the jvm via the jvmti complied method load callback. With this information on the symbol, line number and source file, perf could then persist the data in an ELF formatted file. > > > >> It just needs a better interface from the agent to perf, to pass all > >> needed information, including symbols, line numbers, executable code > >> (for PT decoding and for showing diassembler), and ordering it by time > >> so that no hacks are needed. > >> > >> BTW other JITs (LLVM, Mono, V8, ...) have similar interfaces. > >> > >> Longer term as the kernel gets more JITed (eBPF etc.) it likely needs > >> some kind of JIT interface too. > > > > Right, this is something we need to have, no questions about it, its > > just a matter of cooking up some prototype implementation... > > > > If, for instance, the java agent would put on some file those events, > > timestamped, then when in perf report we would just insert them into the > > event stream as synthesized PERF_RECORD_MMAPs, probably that would be > > enough. > > Really to be useful you need at least line numbers, better code. > > In theory the agent could write ELF files with dwarf, but that may get > really ugly. Probably better to have something more light weight. Perf would need to do this so the data collection and ELF file conversion will be done only when pref is doing the profiling of the JITed code. Specifically, if perf is passed the pid of a running JVM the agent library would not know when profiling starts and stops. Carl Love ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 17:41 ` Carl Love @ 2014-12-10 18:09 ` Andi Kleen 2014-12-10 19:21 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 29+ messages in thread From: Andi Kleen @ 2014-12-10 18:09 UTC (permalink / raw) To: Carl Love Cc: Arnaldo Carvalho de Melo, Andi Kleen, Brendan Gregg, Pekka Enberg, linux-perf-use. > This may be more complex then needed. We really just need to track the > symbol. From talking with the Java experts, they say the reused memory > is almost always from methods that are loaded/used early when > initializing the program. Then the memory gets reused by a method that > runs for the bulk of the time. So by getting a timestamp for when the > symbol was loaded you can then determine the length of time each symbol > was in the anonymous memory. Then just associate the sample with the > symbol that was present in the anonymous memory the longest. You can > also tag the symbol names to say that they reside in the same memory, > ie, the memory was reused. To warn the user that there maybe some > ambiguity in the sample to symbol association. We need a way to transfer the code too. Just symbols is not enough. This needs exact tracking. > The source file and line number information is optionally available from > the jvm via the jvmti complied method load callback. With this > information on the symbol, line number and source file, perf could then > persist the data in an ELF formatted file. I'm not sure ELF/dwarf is the right way to go there. That would be very complex. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 17:41 ` Carl Love 2014-12-10 18:09 ` Andi Kleen @ 2014-12-10 19:21 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2014-12-10 19:21 UTC (permalink / raw) To: Carl Love; +Cc: Andi Kleen, Brendan Gregg, Pekka Enberg, linux-perf-use. Em Wed, Dec 10, 2014 at 09:41:35AM -0800, Carl Love escreveu: > On Tue, 2014-12-09 at 16:38 -0800, Andi Kleen wrote: > > Arnaldo Carvalho de Melo <acme@kernel.org> writes: > > > > > When we use mmap(addr, len, PROT_EXEC) the kernel has a meta event > > > called PERF_RECORD_MMAP that will record addr range, symtab DSO path, > > > and we ask it to be timestamped, how to do that for the equivalent part > > > in the JVM? > > > > http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html > > > > You just need a way to push it from the agent to perf. > > > > This may be more complex then needed. We really just need to track the > symbol. From talking with the Java experts, they say the reused memory > is almost always from methods that are loaded/used early when > initializing the program. Then the memory gets reused by a method that > runs for the bulk of the time. So by getting a timestamp for when the > symbol was loaded you can then determine the length of time each symbol > was in the anonymous memory. Then just associate the sample with the > symbol that was present in the anonymous memory the longest. You can > also tag the symbol names to say that they reside in the same memory, > ie, the memory was reused. To warn the user that there maybe some > ambiguity in the sample to symbol association. > > > > > > > > My initial thought was to find that using perf probe and insert there a > > > probe point, but I think that there may be already an existing > > > tracepoint in the jvm for that, one that, from what I've read so far, > > > is _not_ being used by this java perf agent, right? > > > > The agent interface works by linking an agent with a special library > > I believe. So the agent can do whatever it wants. > > > > From what I understood, how would it insert that event into the > > > perf.data event stream? Only if it necessarily involved a new mmap, via > > > the kernel, etc. > > > > You basically need a way to trigger a perf event from user space. > > One simple way would be to just write a perf.data too into /tmp, > > and let perf collect that. > > > > But it's more than that. You also need to transfer the executable > > code somewhere, and the line numbers. > > > > The source file and line number information is optionally available from > the jvm via the jvmti complied method load callback. With this Right, this could be used to synthesize DWARF for that, as described in the message I just sent. > information on the symbol, line number and source file, perf could then > persist the data in an ELF formatted file. > > > > > > > >> It just needs a better interface from the agent to perf, to pass all > > >> needed information, including symbols, line numbers, executable code > > >> (for PT decoding and for showing diassembler), and ordering it by time > > >> so that no hacks are needed. > > >> > > >> BTW other JITs (LLVM, Mono, V8, ...) have similar interfaces. > > >> > > >> Longer term as the kernel gets more JITed (eBPF etc.) it likely needs > > >> some kind of JIT interface too. > > > > > > Right, this is something we need to have, no questions about it, its > > > just a matter of cooking up some prototype implementation... > > > > > > If, for instance, the java agent would put on some file those events, > > > timestamped, then when in perf report we would just insert them into the > > > event stream as synthesized PERF_RECORD_MMAPs, probably that would be > > > enough. > > > > Really to be useful you need at least line numbers, better code. > > > > In theory the agent could write ELF files with dwarf, but that may get > > really ugly. Probably better to have something more light weight. > > Perf would need to do this so the data collection and ELF file > conversion will be done only when pref is doing the profiling of the > JITed code. Specifically, if perf is passed the pid of a running JVM the > agent library would not know when profiling starts and stops. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 0:38 ` Andi Kleen 2014-12-10 17:41 ` Carl Love @ 2014-12-10 19:19 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2014-12-10 19:19 UTC (permalink / raw) To: Andi Kleen; +Cc: Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. Em Tue, Dec 09, 2014 at 04:38:30PM -0800, Andi Kleen escreveu: > Arnaldo Carvalho de Melo <acme@kernel.org> writes: > > > When we use mmap(addr, len, PROT_EXEC) the kernel has a meta event > > called PERF_RECORD_MMAP that will record addr range, symtab DSO path, > > and we ask it to be timestamped, how to do that for the equivalent part > > in the JVM? > > http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html > > You just need a way to push it from the agent to perf. Yeah, things like the CompiledMethodLoad with a machinepc location format, etc: http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html#CompiledMethodLoad http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html#GetJLocationFormat JVMTI_JLOCATION_MACHINEPC 2 jlocation values represent native machine program counter values. I haven't had a look at any agent implementation, but it would have to use this kind of info to figure out if it would have to generate a new PERF_RECORD_MMAP with a new syhthesized DSO map, i.e. if a method load would put some method in a different place, not reusing its previous addr range, etc, interesting. > > My initial thought was to find that using perf probe and insert there a > > probe point, but I think that there may be already an existing > > tracepoint in the jvm for that, one that, from what I've read so far, > > is _not_ being used by this java perf agent, right? > The agent interface works by linking an agent with a special library > I believe. So the agent can do whatever it wants. From the quick look we receive some kind of event (CompiledMethodLoad) with addr ranges for JITted methods, etc. > > From what I understood, how would it insert that event into the > > perf.data event stream? Only if it necessarily involved a new mmap, via > > the kernel, etc. > > You basically need a way to trigger a perf event from user space. Yes if you want to reuse the existing tooling somehow, but we could have a perf data merger that would pick multiple perf.data files and, having a common clocksource, get a perf.data file generated by the agent with the relevant PERF_RECORD_MMAP records. From what I saw we could event generate PERF_RECORD_FORK/EXIT for native Java threads and use that mostly like native OS threads :-) > One simple way would be to just write a perf.data too into /tmp, > and let perf collect that. > > But it's more than that. You also need to transfer the executable > code somewhere, and the line numbers. Right, if the most easy for the existing perf tooling would be to generate multiple ELF files with DWARF info that would be accessed by build-id inserted into a PERF_RECORD_MMAP3 record. > >> It just needs a better interface from the agent to perf, to pass all > >> needed information, including symbols, line numbers, executable code > >> (for PT decoding and for showing diassembler), and ordering it by time > >> so that no hacks are needed. > >> > >> BTW other JITs (LLVM, Mono, V8, ...) have similar interfaces. > >> > >> Longer term as the kernel gets more JITed (eBPF etc.) it likely needs > >> some kind of JIT interface too. > > > > Right, this is something we need to have, no questions about it, its > > just a matter of cooking up some prototype implementation... > > > > If, for instance, the java agent would put on some file those events, > > timestamped, then when in perf report we would just insert them into the > > event stream as synthesized PERF_RECORD_MMAPs, probably that would be > > enough. > > Really to be useful you need at least line numbers, better code. > > In theory the agent could write ELF files with dwarf, but that may get > really ugly. Probably better to have something more light weight. For reducing the changes needed to tools/perf, that would be optimal, for the agent implementers... I guess not ;-) - Arnaldo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-09 22:22 ` Perf support for interpreted and Just-In-Time translated olanguages Arnaldo Carvalho de Melo 2014-12-10 0:38 ` Andi Kleen @ 2014-12-10 17:32 ` Andi Kleen 2014-12-10 17:39 ` David Ahern 1 sibling, 1 reply; 29+ messages in thread From: Andi Kleen @ 2014-12-10 17:32 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Andi Kleen, Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. > I understood that there is a way to ask for the current JITted code -> > symtab, my question was specifically about how to get notifications when > those mappings change. JVMTI has a callback interface, so you can register call backs for specific events: http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html#EventSection > When we use mmap(addr, len, PROT_EXEC) the kernel has a meta event > called PERF_RECORD_MMAP that will record addr range, symtab DSO path, > and we ask it to be timestamped, how to do that for the equivalent part > in the JVM? The CompiledMethodLoad callback would trigger that event. > > My initial thought was to find that using perf probe and insert there a > probe point, but I think that there may be already an existing > tracepoint in the jvm for that, one that, from what I've read so far, > is _not_ being used by this java perf agent, right? It's not needed. Java already has all the needed hooks. Maybe for some different JITs. > > From what I understood, how would it insert that event into the > perf.data event stream? Only if it necessarily involved a new mmap, via > the kernel, etc. That's the new interface to be defined. Just write a perf.data? -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 17:32 ` Andi Kleen @ 2014-12-10 17:39 ` David Ahern 2014-12-10 18:05 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: David Ahern @ 2014-12-10 17:39 UTC (permalink / raw) To: Andi Kleen, Arnaldo Carvalho de Melo Cc: Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. On 12/10/14 10:32 AM, Andi Kleen wrote: >> From what I understood, how would it insert that event into the >> perf.data event stream? Only if it necessarily involved a new mmap, via >> the kernel, etc. > > That's the new interface to be defined. > > Just write a perf.data? Pawel Moll's new ioctl -- assumming it gets committed. David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 17:39 ` David Ahern @ 2014-12-10 18:05 ` Andi Kleen 2014-12-10 18:27 ` David Ahern 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2014-12-10 18:05 UTC (permalink / raw) To: David Ahern Cc: Andi Kleen, Arnaldo Carvalho de Melo, Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. On Wed, Dec 10, 2014 at 10:39:53AM -0700, David Ahern wrote: > On 12/10/14 10:32 AM, Andi Kleen wrote: > >> From what I understood, how would it insert that event into the > >>perf.data event stream? Only if it necessarily involved a new mmap, via > >>the kernel, etc. > > > >That's the new interface to be defined. > > > >Just write a perf.data? > > Pawel Moll's new ioctl -- assumming it gets committed. We can't push large volume data -- like line numbers and executable code - through a ioctl. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 18:05 ` Andi Kleen @ 2014-12-10 18:27 ` David Ahern 2014-12-10 19:43 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 29+ messages in thread From: David Ahern @ 2014-12-10 18:27 UTC (permalink / raw) To: Andi Kleen Cc: Arnaldo Carvalho de Melo, Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. On 12/10/14 11:05 AM, Andi Kleen wrote: > On Wed, Dec 10, 2014 at 10:39:53AM -0700, David Ahern wrote: >> On 12/10/14 10:32 AM, Andi Kleen wrote: >>>> From what I understood, how would it insert that event into the >>>> perf.data event stream? Only if it necessarily involved a new mmap, via >>>> the kernel, etc. >>> >>> That's the new interface to be defined. >>> >>> Just write a perf.data? >> >> Pawel Moll's new ioctl -- assumming it gets committed. > > We can't push large volume data -- like line numbers and executable code - > through a ioctl. If you write a separate perf.data file then it has to be merged with the file generated by perf which brings in the perf_clock timestamp problem since timestamps are needed to merge the data sets. David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 18:27 ` David Ahern @ 2014-12-10 19:43 ` Arnaldo Carvalho de Melo 2015-01-09 20:19 ` Carl Love 0 siblings, 1 reply; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2014-12-10 19:43 UTC (permalink / raw) To: David Ahern Cc: Andi Kleen, Brendan Gregg, Carl Love, Pekka Enberg, linux-perf-use. Em Wed, Dec 10, 2014 at 11:27:18AM -0700, David Ahern escreveu: > On 12/10/14 11:05 AM, Andi Kleen wrote: > >On Wed, Dec 10, 2014 at 10:39:53AM -0700, David Ahern wrote: > >>On 12/10/14 10:32 AM, Andi Kleen wrote: > >>>> From what I understood, how would it insert that event into the > >>>>perf.data event stream? Only if it necessarily involved a new mmap, via > >>>>the kernel, etc. > >>>That's the new interface to be defined. > >>>Just write a perf.data? > >>Pawel Moll's new ioctl -- assumming it gets committed. > >We can't push large volume data -- like line numbers and executable code - > >through a ioctl. > If you write a separate perf.data file then it has to be merged with the > file generated by perf which brings in the perf_clock timestamp problem > since timestamps are needed to merge the data sets. yeap, but what he is saying is kinda like the problem with long running perf sessions where a library may be updated while keeping the same pathname, i.e. we samples taken up to the time the symtab backing storage was replaced would be unreliable. I think this should be solved in the same way, i.e. content based keys, that we call build-ids, every mapping, when put in place generates an event, say PERF_RECORD_MMAP3, that comes with a key that can later be used to retrieve the matching ELF file with DWARF info for annotation, symbol resolution, unwinding, etc. Modern distros have this and that is why we store in the perf.data file just the build ids, not the full ELF files at the time of the recording session: [root@ssdandy ~]# perf record -a sleep 1 ^[[A^[[A[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.475 MB perf.data (~20748 samples) ] [root@ssdandy ~]# perf buildid-list 2aae14c5b9aef2a3ebce0f168814ca7391c7ea6b /usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux e62a248621f017c39fbdeff22e3fb68ad3e0be77 /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/ata/libahci.ko 8acd5b3b420f0a37af9d661da5d6f88a86908c52 /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko 37bed793bb0736d341002165538659862dcc9e4f /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/scsi/sd_mod.ko 8f0c72d8c938fdeecef941e47a586614ce1014e7 /lib/modules/3.10.0-123.el7.x86_64/kernel/fs/xfs/xfs.ko 9bb557cc045eb57af2f83b315389dd8287fb7f60 /lib/modules/3.10.0-123.el7.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko cff370844d00ea5451d7add439646a93c64d48a5 /usr/lib64/libc-2.17.so 18562ee0363bc9bd7101610bd86469aa426d0c44 /usr/lib64/libpthread-2.17.so ebd9fbf2265129ceab3866d40c826c9629f08cd0 [vdso] a8cf24d1279557a3f2e563c21a858dcd8784b665 /usr/lib64/libcrypto.so.1.0.1e 148a981dc96876372c6f5c06bbb3efd0a1668432 /usr/sbin/sshd ce3715b450fc4015a6763b210c868f930d2cffa6 /usr/bin/find [root@ssdandy ~]# [root@ssdandy ~]# ls -la perf.data -rw-------. 1 root root 501828 Dec 10 16:26 perf.data [root@ssdandy ~]# ls -la /usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux -rwxr-xr-x. 2 root root 146606780 May 5 2014 /usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux [root@ssdandy ~]# ls -la /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/ata/libahci.ko -rw-r--r--. 2 root root 51561 May 5 2014 /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/ata/libahci.ko [root@ssdandy ~]# ls -la /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko -rw-r--r--. 2 root root 400273 May 5 2014 /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko [root@ssdandy ~]# /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko And by the build-id we can get to the right file, that may even not be installed on the target machine at the time of profiling. [root@ssdandy ~]# perf buildid-list -i /usr/lib64/libc-2.17.so cff370844d00ea5451d7add439646a93c64d48a5 [root@ssdandy ~]# ls -la /usr/lib/debug/.build-id/cf/f370844d00ea5451d7add439646a93c64d48a5 lrwxrwxrwx. 1 root root 33 Dec 10 16:32 /usr/lib/debug/.build-id/cf/f370844d00ea5451d7add439646a93c64d48a5 -> ../../../../../lib64/libc-2.17.so [root@ssdandy ~]# rpm -qf /usr/lib/debug/.build-id/cf/f370844d00ea5451d7add439646a93c64d48a5 glibc-debuginfo-2.17-55.el7_0.1.x86_64 [root@ssdandy ~]# rpm -q glibc glibc-2.17-55.el7_0.1.x86_64 [root@ssdandy ~]# The agent would need to, using PERF_RECORD_MMAP3 + build-ids plus intercepting the CompiledMethodLoad synthesize the right ELF files + build-ids, which could be just one per workload, or multiple, if those methods gets recompiled frequently and move to different locations. And we may want even to have a different one per version of the compiled method, so that developers could figure out which compiled method generation is best :-) - Arnaldo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2014-12-10 19:43 ` Arnaldo Carvalho de Melo @ 2015-01-09 20:19 ` Carl Love 2015-01-10 4:15 ` William Cohen 0 siblings, 1 reply; 29+ messages in thread From: Carl Love @ 2015-01-09 20:19 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, linux-perf-use. Arnaldo: > I think this should be solved in the same way, i.e. content based keys, > that we call build-ids, every mapping, when put in place generates an > event, say PERF_RECORD_MMAP3, that comes with a key that can later be > used to retrieve the matching ELF file with DWARF info for annotation, > symbol resolution, unwinding, etc. > > Modern distros have this and that is why we store in the perf.data file > just the build ids, not the full ELF files at the time of the recording > session: I have been looking at and trying to code up some JIT support for perf. I have written a library to register the callbacks from a Java application. That is all fairly easy stuff. I have been trying to figure out how to get this library to communicate and send an event record to perf so perf can write it into perf.data, as you mentioned above. Specifically we will want to write the load and unload records to perf.data. The library gets loaded into the Java application but if you try loading the library in perf, they do not share the same data space so you can't pass the data directly. The only solution I see is creating a shared memory space where the library can place the event info. Then perf will have to connect to the shared memory space and "watch" for data to show up. This seems really awkward and slower then what we need. We need the notifications to be put into the perf.data file as close in time as possible to the event to ensure proper mapping of the addresses. It also means only one Java program can be using the interface at a time. I have thought about pipes, callbacks, but I don't see any way to get these to work between the library loaded in the Java program and perf. Just wondering if you had any thoughts on how to do the communication? Carl Love ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-09 20:19 ` Carl Love @ 2015-01-10 4:15 ` William Cohen 2015-01-10 15:14 ` David Ahern 0 siblings, 1 reply; 29+ messages in thread From: William Cohen @ 2015-01-10 4:15 UTC (permalink / raw) To: Carl Love, Arnaldo Carvalho de Melo, linux-perf-use. On 01/09/2015 03:19 PM, Carl Love wrote: > Arnaldo: > >> I think this should be solved in the same way, i.e. content based keys, >> that we call build-ids, every mapping, when put in place generates an >> event, say PERF_RECORD_MMAP3, that comes with a key that can later be >> used to retrieve the matching ELF file with DWARF info for annotation, >> symbol resolution, unwinding, etc. >> >> Modern distros have this and that is why we store in the perf.data file >> just the build ids, not the full ELF files at the time of the recording >> session: > > I have been looking at and trying to code up some JIT support for perf. > I have written a library to register the callbacks from a Java > application. That is all fairly easy stuff. > > I have been trying to figure out how to get this library to communicate > and send an event record to perf so perf can write it into perf.data, as > you mentioned above. Specifically we will want to write the load and > unload records to perf.data. The library gets loaded into the Java > application but if you try loading the library in perf, they do not > share the same data space so you can't pass the data directly. > > The only solution I see is creating a shared memory space where the > library can place the event info. Then perf will have to connect to the > shared memory space and "watch" for data to show up. This seems really > awkward and slower then what we need. We need the notifications to be > put into the perf.data file as close in time as possible to the event to > ensure proper mapping of the addresses. It also means only one Java > program can be using the interface at a time. > > I have thought about pipes, callbacks, but I don't see any way to get > these to work between the library loaded in the Java program and perf. > Just wondering if you had any thoughts on how to do the communication? > > Carl Love Hi Carl, Too bad there isn't a "sys_perf_event" syscall to allow user-space applications to inject like a software event style entries into the kernel's recording of perf events. The AMD lightweight profiling mechanism specified the LWPINS instruction to insert a software event entry into the data buffer (http://support.amd.com/TechDocs/43724.pdf). Seems like the linux kernel should have a similar mechanism to allow user and kernel-space to inject data in the perf records. -Will ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-10 4:15 ` William Cohen @ 2015-01-10 15:14 ` David Ahern 2015-01-12 17:22 ` Carl Love 0 siblings, 1 reply; 29+ messages in thread From: David Ahern @ 2015-01-10 15:14 UTC (permalink / raw) To: William Cohen, Carl Love, Arnaldo Carvalho de Melo, linux-perf-use. On 1/9/15 9:15 PM, William Cohen wrote: >> I have thought about pipes, callbacks, but I don't see any way to get >> these to work between the library loaded in the Java program and perf. >> Just wondering if you had any thoughts on how to do the communication? >> >> Carl Love > > Hi Carl, > > Too bad there isn't a "sys_perf_event" syscall to allow user-space applications to inject like a software event style entries into the kernel's recording of perf events. The AMD lightweight profiling mechanism specified the LWPINS instruction to insert a software event entry into the data buffer (http://support.amd.com/TechDocs/43724.pdf). Seems like the linux kernel should have a similar mechanism to allow user and kernel-space to inject data in the perf records. > > -Will https://lkml.org/lkml/2014/11/3/917 David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-10 15:14 ` David Ahern @ 2015-01-12 17:22 ` Carl Love 2015-01-12 17:58 ` David Ahern 0 siblings, 1 reply; 29+ messages in thread From: Carl Love @ 2015-01-12 17:22 UTC (permalink / raw) To: David Ahern; +Cc: William Cohen, Arnaldo Carvalho de Melo, linux-perf-use. On Sat, 2015-01-10 at 08:14 -0700, David Ahern wrote: > On 1/9/15 9:15 PM, William Cohen wrote: > >> I have thought about pipes, callbacks, but I don't see any way to get > >> these to work between the library loaded in the Java program and perf. > >> Just wondering if you had any thoughts on how to do the communication? > >> > >> Carl Love > > > > Hi Carl, > > > > Too bad there isn't a "sys_perf_event" syscall to allow user-space applications to inject like a software event style entries into the kernel's recording of perf events. The AMD lightweight profiling mechanism specified the LWPINS instruction to insert a software event entry into the data buffer (http://support.amd.com/TechDocs/43724.pdf). Seems like the linux kernel should have a similar mechanism to allow user and kernel-space to inject data in the perf records. > > > > -Will > > https://lkml.org/lkml/2014/11/3/917 > > David > David: Ah, this is the ioctl patch you had mentioned you mentioned previously. I hadn't found the patch before. Yes, this looks like it would work. I will see if I can get a prototype working with this patch. Thanks. Carl Love ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-12 17:22 ` Carl Love @ 2015-01-12 17:58 ` David Ahern 2015-01-12 18:43 ` Carl Love 2015-01-20 18:19 ` Carl Love 0 siblings, 2 replies; 29+ messages in thread From: David Ahern @ 2015-01-12 17:58 UTC (permalink / raw) To: Carl Love; +Cc: William Cohen, Arnaldo Carvalho de Melo, linux-perf-use. On 1/12/15 10:22 AM, Carl Love wrote: > Ah, this is the ioctl patch you had mentioned you mentioned previously. > I hadn't found the patch before. Yes, this looks like it would work. I > will see if I can get a prototype working with this patch. Thanks. > If you need to shove samples into perf (versus mmap updates) I suspect the prctl system call will have way to much overhead. In that case perhaps processes could export a shared memory buffer that a perf session could attach -- another aux buffer similar to what itrace needs. But then that brings in the perf_clock issue; samples would need to have the same time basis as kernel generated samples. David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-12 17:58 ` David Ahern @ 2015-01-12 18:43 ` Carl Love 2015-01-20 18:19 ` Carl Love 1 sibling, 0 replies; 29+ messages in thread From: Carl Love @ 2015-01-12 18:43 UTC (permalink / raw) To: David Ahern; +Cc: William Cohen, Arnaldo Carvalho de Melo, linux-perf-use. On Mon, 2015-01-12 at 10:58 -0700, David Ahern wrote: > On 1/12/15 10:22 AM, Carl Love wrote: > > Ah, this is the ioctl patch you had mentioned you mentioned previously. > > I hadn't found the patch before. Yes, this looks like it would work. I > > will see if I can get a prototype working with this patch. Thanks. > > > > If you need to shove samples into perf (versus mmap updates) I suspect > the prctl system call will have way to much overhead. In that case > perhaps processes could export a shared memory buffer that a perf > session could attach -- another aux buffer similar to what itrace needs. > But then that brings in the perf_clock issue; samples would need to have > the same time basis as kernel generated samples. > > David David: My thought was to just send the load/unload notifications with a key of some sort that can then be used to correlate the load/unload key to the same key associated with an entry in a separate file where the code would be written. The build-id was mentioned by Arnaldo which could be used as the key. I was trying to figure out if I can generate the build-id key in the java library file. Not sure if that is practical at this point. Carl Love ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-12 17:58 ` David Ahern 2015-01-12 18:43 ` Carl Love @ 2015-01-20 18:19 ` Carl Love 2015-01-20 19:29 ` Arnaldo Carvalho de Melo 2015-01-23 8:25 ` Sujoy Saraswati 1 sibling, 2 replies; 29+ messages in thread From: Carl Love @ 2015-01-20 18:19 UTC (permalink / raw) To: David Ahern; +Cc: William Cohen, Arnaldo Carvalho de Melo, linux-perf-use. On Mon, 2015-01-12 at 10:58 -0700, David Ahern wrote: > On 1/12/15 10:22 AM, Carl Love wrote: > > Ah, this is the ioctl patch you had mentioned you mentioned previously. > > I hadn't found the patch before. Yes, this looks like it would work. I > > will see if I can get a prototype working with this patch. Thanks. > > > > If you need to shove samples into perf (versus mmap updates) I suspect > the prctl system call will have way to much overhead. In that case > perhaps processes could export a shared memory buffer that a perf > session could attach -- another aux buffer similar to what itrace needs. > But then that brings in the perf_clock issue; samples would need to have > the same time basis as kernel generated samples. > I have the kernel patch by Pawel working to send the source file name, the code address and the code size to perf and insert a new record into the perf.data file as a uevent. In the perf code, I have added code in file util/session.c, perf_session__deliver_event() to call a function to process the new event data. I am trying to start with just getting the samples mapped to the elf file. I will try to implement mapping the samples to a specific source code line later. My thought is I need to take the data and create a "fake" elf file entry with the file name, start address and code size so the will be able to map sample addresses to the elf file for the java method. I am struggling to understand code flow for the perf record, specifically when the elf files get read, mapping a sample to an elf file, etc. It is not clear to me how I would go about creating the fake elf file and how to make it visible for use by the perf record tool. Any pointers and guidance would be appreciated. Thanks. Carl Love ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-20 18:19 ` Carl Love @ 2015-01-20 19:29 ` Arnaldo Carvalho de Melo 2015-01-20 20:34 ` Carl Love 2015-01-23 8:25 ` Sujoy Saraswati 1 sibling, 1 reply; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-01-20 19:29 UTC (permalink / raw) To: Carl Love; +Cc: David Ahern, William Cohen, linux-perf-use. Em Tue, Jan 20, 2015 at 10:19:10AM -0800, Carl Love escreveu: > On Mon, 2015-01-12 at 10:58 -0700, David Ahern wrote: > > On 1/12/15 10:22 AM, Carl Love wrote: > > > Ah, this is the ioctl patch you had mentioned you mentioned previously. > > > I hadn't found the patch before. Yes, this looks like it would work. I > > > will see if I can get a prototype working with this patch. Thanks. > > If you need to shove samples into perf (versus mmap updates) I suspect > > the prctl system call will have way to much overhead. In that case > > perhaps processes could export a shared memory buffer that a perf > > session could attach -- another aux buffer similar to what itrace needs. > > But then that brings in the perf_clock issue; samples would need to have > > the same time basis as kernel generated samples. > I have the kernel patch by Pawel working to send the source file name, > the code address and the code size to perf and insert a new record into > the perf.data file as a uevent. In the perf code, I have added code in > file util/session.c, perf_session__deliver_event() to call a function to > process the new event data. > I am trying to start with just getting the samples mapped to the elf > file. I will try to implement mapping the samples to a specific source > code line later. > My thought is I need to take the data and create a "fake" elf file entry > with the file name, start address and code size so the will be able to > map sample addresses to the elf file for the java method. I am > struggling to understand code flow for the perf record, specifically > when the elf files get read, mapping a sample to an elf file, etc. It > is not clear to me how I would go about creating the fake elf file and > how to make it visible for use by the perf record tool. Any pointers > and guidance would be appreciated. Thanks. I don't think you need to create any fake ELF file. The way things work are: Somehow you start processing samples, be it by creating a perf_session object passing a perf.data file, or directly like 'perf trace' does. The perf_session method will, behind the scenes, do what perf trace does. Then the callback you provided to perf_session, more specifically perf_tool.sample() will be called, and you will call some library functions to ask for it to find the thread, DSO and symbol for that sample. More specifically the sequence map__load -> dso__load() will take place and at some point you will be able to call map__find_symbol() for a given address and it will return a struct symbol. Behind the scenes what is done to have an rb_tree that will get you from addr to symbol will either use ELF routines to grab the symtab or do it via a /proc/kallsyms or similar, for instance, that java JIT interface described in tools/perf/Documentation/jit-interface.txt. But yes, since you mention "mapping samples to a specific source code line", then we only have that, at this moment, for ELF files. But if what you want is that, i.e. source code annotation, then you should look at how it parses objdump output, we could conceivably have support for other kinds of annotation sources, or you could instead generate output that mimics what objdump produces, so that the current parser could grok it, and instead of calling objdump to get an ELF file and produce that output, we would call a routine that with your input produces similar output. Does that help? Do you need some more specific explanation? - Arnaldo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-20 19:29 ` Arnaldo Carvalho de Melo @ 2015-01-20 20:34 ` Carl Love 2015-01-20 20:52 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 29+ messages in thread From: Carl Love @ 2015-01-20 20:34 UTC (permalink / raw) To: Arnaldo Carvalho de Melo; +Cc: David Ahern, William Cohen, linux-perf-use. On Tue, 2015-01-20 at 16:29 -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Jan 20, 2015 at 10:19:10AM -0800, Carl Love escreveu: > > On Mon, 2015-01-12 at 10:58 -0700, David Ahern wrote: > > > On 1/12/15 10:22 AM, Carl Love wrote: > > > > Ah, this is the ioctl patch you had mentioned you mentioned previously. > > > > I hadn't found the patch before. Yes, this looks like it would work. I > > > > will see if I can get a prototype working with this patch. Thanks. > > > > If you need to shove samples into perf (versus mmap updates) I suspect > > > the prctl system call will have way to much overhead. In that case > > > perhaps processes could export a shared memory buffer that a perf > > > session could attach -- another aux buffer similar to what itrace needs. > > > But then that brings in the perf_clock issue; samples would need to have > > > the same time basis as kernel generated samples. > > > I have the kernel patch by Pawel working to send the source file name, > > the code address and the code size to perf and insert a new record into > > the perf.data file as a uevent. In the perf code, I have added code in > > file util/session.c, perf_session__deliver_event() to call a function to > > process the new event data. > > > I am trying to start with just getting the samples mapped to the elf > > file. I will try to implement mapping the samples to a specific source > > code line later. > > > My thought is I need to take the data and create a "fake" elf file entry > > with the file name, start address and code size so the will be able to > > map sample addresses to the elf file for the java method. I am > > struggling to understand code flow for the perf record, specifically > > when the elf files get read, mapping a sample to an elf file, etc. It > > is not clear to me how I would go about creating the fake elf file and > > how to make it visible for use by the perf record tool. Any pointers > > and guidance would be appreciated. Thanks. > > I don't think you need to create any fake ELF file. The way things work > are: > > Somehow you start processing samples, be it by creating a perf_session > object passing a perf.data file, or directly like 'perf trace' does. The > perf_session method will, behind the scenes, do what perf trace does. > > Then the callback you provided to perf_session, more specifically > perf_tool.sample() will be called, and you will call some library > functions to ask for it to find the thread, DSO and symbol for that > sample. > > More specifically the sequence map__load -> dso__load() will take place > and at some point you will be able to call map__find_symbol() for a > given address and it will return a struct symbol. > I see that mmap and mmap2 call map__new() in map.c to create a new map then call thread__insert_map() to add it into the rb_tree. > Behind the scenes what is done to have an rb_tree that will get you from > addr to symbol will either use ELF routines to grab the symtab or do it > via a /proc/kallsyms or similar, for instance, that java JIT interface > described in tools/perf/Documentation/jit-interface.txt. Yes, I see where the map__new() puts in the file name as "/tmp/perf-pid.map" entry. It looks like I will want to have the JIT method name instead of "perf-pid". This would then give the mapping of the sample back to the JIT method name. I will work on this a bit more. > > But yes, since you mention "mapping samples to a specific source code > line", then we only have that, at this moment, for ELF files. But if > what you want is that, i.e. source code annotation, then you should look > at how it parses objdump output, we could conceivably have support for > other kinds of annotation sources, or you could instead generate output > that mimics what objdump produces, so that the current parser could grok > it, and instead of calling objdump to get an ELF file and produce that > output, we would call a routine that with your input produces similar > output. The source code annotation would be nice in the future. I will be happy to just start with something a bit simpler, i.e. mapping the addresses to the method name. Best to get the basics figured out and then build from there. > > Does that help? Do you need some more specific explanation? Yes, it helps get me going in the right direction. Let me see if I can get the mapping to the JIT method name working and I can then post what I have so far and we can go from there. Thanks. Carl Love ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-20 20:34 ` Carl Love @ 2015-01-20 20:52 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-01-20 20:52 UTC (permalink / raw) To: Carl Love; +Cc: David Ahern, William Cohen, linux-perf-use. Em Tue, Jan 20, 2015 at 12:34:23PM -0800, Carl Love escreveu: > On Tue, 2015-01-20 at 16:29 -0300, Arnaldo Carvalho de Melo wrote: > > Does that help? Do you need some more specific explanation? > Yes, it helps get me going in the right direction. Let me see if I can > get the mapping to the JIT method name working and I can then post what > I have so far and we can go from there. Great, agreed on the other points, - Arnaldo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated olanguages 2015-01-20 18:19 ` Carl Love 2015-01-20 19:29 ` Arnaldo Carvalho de Melo @ 2015-01-23 8:25 ` Sujoy Saraswati 1 sibling, 0 replies; 29+ messages in thread From: Sujoy Saraswati @ 2015-01-23 8:25 UTC (permalink / raw) To: linux-perf-users Hi all, Carl Love <cel <at> us.ibm.com> writes: > > On Mon, 2015-01-12 at 10:58 -0700, David Ahern wrote: > > On 1/12/15 10:22 AM, Carl Love wrote: > > > Ah, this is the ioctl patch you had mentioned you mentioned previously. > > > I hadn't found the patch before. Yes, this looks like it would work. I > > > will see if I can get a prototype working with this patch. Thanks. > > > > > > > If you need to shove samples into perf (versus mmap updates) I suspect > > the prctl system call will have way to much overhead. In that case > > perhaps processes could export a shared memory buffer that a perf > > session could attach -- another aux buffer similar to what itrace needs. > > But then that brings in the perf_clock issue; samples would need to have > > the same time basis as kernel generated samples. > > > > I have the kernel patch by Pawel working to send the source file name, > the code address and the code size to perf and insert a new record into > the perf.data file as a uevent. In the perf code, I have added code in > file util/session.c, perf_session__deliver_event() to call a function to > process the new event data. > > I am trying to start with just getting the samples mapped to the elf > file. I will try to implement mapping the samples to a specific source > code line later. > > My thought is I need to take the data and create a "fake" elf file entry > with the file name, start address and code size so the will be able to > map sample addresses to the elf file for the java method. I am > struggling to understand code flow for the perf record, specifically > when the elf files get read, mapping a sample to an elf file, etc. It > is not clear to me how I would go about creating the fake elf file and > how to make it visible for use by the perf record tool. Any pointers > and guidance would be appreciated. Thanks. FWIW, the HP caliper tool on HP-UX/IPF has some similar functionality to show Java function names and show an annotated source code in the function disassembly part of the report. Caliper runs as a separate process measuring the target process by tracing it, and the target process can register dynamically generated code/module with caliper. JVM code for HP-UX was modified to use this framework to register the dynamically generated methods with caliper at runtime. As for the source code annotation for JVM generated code through JIT compiler, we use a Java option to dump the annotated assembly from JIT into a file, and also generate an index file to search the assembly dump given an address. The index file is somewhat similar to a dwarf line table, but it's just a plain text file and not in an elf format. When caliper shows the disassembly, it looks up for the index file and finds the relevant annotation to display for the address ranges in the assembly. Since the annotated dump comes from Java itself, it contains information like source line number, prologue region etc. For example, if it is a safepoint, the information about Oop maps, bytecode index, etc are available in those annotations which are very JVM specific. Basically, the idea was to merge the JVM assembly annotations along with the caliper report. I am not familiar with perf, but just wanted to bring up this information to this forum to see if this could be an approach to consider for the samples to source co-relation. The source co-relation comes from the JIT compiler like JVM, and can probably made into annotation records. Along with this, the index file could be used to co- relate the annotation records to the address ranges seen in samples. Regards, Sujoy ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated languages 2014-12-09 20:34 ` Arnaldo Carvalho de Melo 2014-12-09 22:01 ` Andi Kleen @ 2014-12-10 7:55 ` Pekka Enberg 1 sibling, 0 replies; 29+ messages in thread From: Pekka Enberg @ 2014-12-10 7:55 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, Brendan Gregg Cc: Carl Love, Pekka Enberg, linux-perf-use. On 12/9/14 10:34 PM, Arnaldo Carvalho de Melo wrote: >>> - How does this work for the offline analysis scenario (i.e., using 'perf archive')? >>> Would the /tmp/perf-<pid>.map files have to be copied over to the host system where >>> the analysis is being done? >> Yes. I keep copies of the perf.map along with the perf.data. It might >> be worth having an option to perf to change the base path for these >> maps, so that I didn't have to keep putting them in /tmp. > Right, this was not really designed, was just a proof of concept for > JATO needs, right Pekka? Indeed. And like with all useful proof of concepts, people started to use it elsewhere as well. :-) - Pekka ^ permalink raw reply [flat|nested] 29+ messages in thread
* Perf support for interpreted and Just-In-Time translated languages @ 2014-12-02 21:08 William Cohen 2014-12-03 2:36 ` Brendan Gregg 0 siblings, 1 reply; 29+ messages in thread From: William Cohen @ 2014-12-02 21:08 UTC (permalink / raw) To: linux-perf-users perf makes use of the debug information provided by the compilers to map the addresses observed in the instruction pointer and on the stack back to source code. This works very well for traditional compiled programs written in c and c++. However, the assumption that the instruction address maps back to something the user wrote is not true for code written in interpretered languages such as python, perl, and Ruby or for Just-In-Time (JIT) runtime environment commonly used for Java. The addresses would either map back to the interpreter runtime or dynamically generated code. It would be really nice if perf was enhanced to provide data about where in the interpreted and JIT'ed code the processor was spending time. OProfile provides the ability to map samples from Java Runtime Environment (JRE) JIT code using a shared library agent loaded when the program starts executing. The shared library uses the JVMTI or JVMPI interface to note the method that each region of JIT'ed code maps to. This is later used to map the instruction pointer back to the appropriate Java method. There is some information on how this is implement at http://oprofile.sourceforge.net/doc/devel/index.html. For traditional interpreters the samples perf get mapped to the internals of the interpreter. Rather than getting samples that map back to the developer's Ruby code, developers get samples that map back to the internals of the Ruby intpreter which they have little control and understanding of. What would be desired is for each memory map region or process to have something that indicates what kind of information perf should record for a sample in that region or process. By default this would fall back on the traditional IP sampling, but allow some user-space memory locations to be read for a line number and dcookie for the file that the code came from instead. -Will ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Perf support for interpreted and Just-In-Time translated languages 2014-12-02 21:08 William Cohen @ 2014-12-03 2:36 ` Brendan Gregg 0 siblings, 0 replies; 29+ messages in thread From: Brendan Gregg @ 2014-12-03 2:36 UTC (permalink / raw) To: William Cohen; +Cc: linux-perf-use. G'Day Will, On Tue, Dec 2, 2014 at 1:08 PM, William Cohen <wcohen@redhat.com> wrote: > perf makes use of the debug information provided by the compilers to > map the addresses observed in the instruction pointer and on the stack > back to source code. This works very well for traditional compiled > programs written in c and c++. However, the assumption that the > instruction address maps back to something the user wrote is not true > for code written in interpretered languages such as python, perl, and > Ruby or for Just-In-Time (JIT) runtime environment commonly used for > Java. The addresses would either map back to the interpreter runtime > or dynamically generated code. It would be really nice if perf was > enhanced to provide data about where in the interpreted and JIT'ed > code the processor was spending time. perf supports the /tmp/perf-PID.map files for JIT translations. It's up to the runtimes to create these files. I was enhancing the Java perf-map-agent today (https://github.com/jrudolph/perf-map-agent), and using it with perf. perf doesn't seem to handle map files that grow (and overwrite symbols) very well, so I had to create an extra step that cleaned up the map file. I should write up the Java instructions somewhere. I did do a writeup for Node.js, whose v8 engine supports the perf map files. See: http://www.brendangregg.com/blog/2014-09-17/node-flame-graphs-on-linux.html Also see tools/perf/Documentation/jit-interface.txt > OProfile provides the ability to map samples from Java Runtime > Environment (JRE) JIT code using a shared library agent loaded when > the program starts executing. The shared library uses the JVMTI or > JVMPI interface to note the method that each region of JIT'ed code > maps to. This is later used to map the instruction pointer back to > the appropriate Java method. There is some information on how this is > implement at http://oprofile.sourceforge.net/doc/devel/index.html. Yes, that's exactly what perf-map-agent does (JVMTI). I only just created the pull request, but if you try perf-map-agent, you'll want to use the fflush fix to avoid buffering lag (https://github.com/jrudolph/perf-map-agent/pull/8). Brendan -- http://www.brendangregg.com ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2015-01-23 8:30 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-05 20:18 Perf support for interpreted and Just-In-Time translated languages Carl Love 2014-12-05 21:27 ` Brendan Gregg 2014-12-09 20:34 ` Arnaldo Carvalho de Melo 2014-12-09 22:01 ` Andi Kleen 2014-12-09 22:22 ` Perf support for interpreted and Just-In-Time translated olanguages Arnaldo Carvalho de Melo 2014-12-10 0:38 ` Andi Kleen 2014-12-10 17:41 ` Carl Love 2014-12-10 18:09 ` Andi Kleen 2014-12-10 19:21 ` Arnaldo Carvalho de Melo 2014-12-10 19:19 ` Arnaldo Carvalho de Melo 2014-12-10 17:32 ` Andi Kleen 2014-12-10 17:39 ` David Ahern 2014-12-10 18:05 ` Andi Kleen 2014-12-10 18:27 ` David Ahern 2014-12-10 19:43 ` Arnaldo Carvalho de Melo 2015-01-09 20:19 ` Carl Love 2015-01-10 4:15 ` William Cohen 2015-01-10 15:14 ` David Ahern 2015-01-12 17:22 ` Carl Love 2015-01-12 17:58 ` David Ahern 2015-01-12 18:43 ` Carl Love 2015-01-20 18:19 ` Carl Love 2015-01-20 19:29 ` Arnaldo Carvalho de Melo 2015-01-20 20:34 ` Carl Love 2015-01-20 20:52 ` Arnaldo Carvalho de Melo 2015-01-23 8:25 ` Sujoy Saraswati 2014-12-10 7:55 ` Perf support for interpreted and Just-In-Time translated languages Pekka Enberg -- strict thread matches above, loose matches on Subject: below -- 2014-12-02 21:08 William Cohen 2014-12-03 2:36 ` Brendan Gregg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).