* fix callgraphs of 32-bit processes on 64-bit kernels.
@ 2010-03-15 15:34 Török Edwin
2010-03-15 15:34 ` [PATCH] perf: x86: " Török Edwin
2010-03-15 16:23 ` Török Edwin
0 siblings, 2 replies; 22+ messages in thread
From: Török Edwin @ 2010-03-15 15:34 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras,
x86, linux-kernel
Hi,
Callgraph profiling 32-bit apps on a 64-bit kernel doesn't work.
The reason is that perf_callchain_user tries to read a stackframe with 64-bit
pointers, which is wrong for a 32-bit process.
This patch fixes that, and I am almost able to get nice callgraph profiles
from 32-bit apps now! (except for some problems with perf itself when tracing
kernel modules, see [1])
Page-faults can be traced nicely (sid-ia32 is a 32-bit chroot):
$ sudo perf record -e page-faults -f -g /home/edwin/sid-ia32/usr/bin/glxgears
$ sudo perf report
...
45.33% libc-2.10.2.so [.] __GI_memcpy
|
--- __GI_memcpy
_mesa_BufferDataARB
_mesa_meta_Clear
radeonUserClear
r700Clear
_mesa_Clear
0x8049367
0x804a6ba
__libc_start_main
0x8049111
16.96% libc-2.10.2.so [.] __GI_memset
|
--- __GI_memset
_tnl_init_vertices
_swsetup_CreateContext
r600CreateContext
driCreateNewContext
dri2CreateNewContext
0xf77ab7dd
0xf7783c67
0xf778514c
0x804974f
0x804a33d
__libc_start_main
0x8049111
And CPU cycles can be traced too in userspace:
$ sudo perf record -f -g /home/edwin/sid-ia32/usr/bin/glxgears
$ sudo perf report --sort comm,dso
[...]
44.97% glxgears r600_dri.so
|
|--5.85%-- r700SendSPIState
| radeonEmitState
| r700DrawPrims
| |
| |--95.45%-- vbo_save_playback_vertex_list
| | execute_list
| | _mesa_CallList
| | neutral_CallList
| | |
| | |--38.10%-- 0x80494a8
| | | 0x804a6ba
| | | __libc_start_main
| | | 0x8049111
[....]
40.00% glxgears [kernel]
|
|--3.14%-- copy_user_generic_string
| |
| |--71.70%-- 0xffffffffa01b4493
| | 0xffffffffa01b7c0b
| | 0xffffffffa018b45b
| | 0xffffffffa00ca927
| | 0xffffffffa01c524e
| | compat_sys_ioctl
| | sysenter_dispatch
| | 0xf77ca430
| | drmCommandWriteRead
| | 0xf74d7ab5
| | 0xf74d89a4
| | rcommonFlushCmdBufLocked
| | rcommonFlushCmdBuf
| | radeonFlush
| | _mesa_flush
| | _mesa_Flush
| | 0xf775f270
| | 0x804a6d5
| | __libc_start_main
| | 0x8049111
| |
| |--15.09%-- 0xffffffffa01c524e
| | compat_sys_ioctl
| | sysenter_dispatch
| | 0xf77ca430
| | drmCommandWriteRead
[1] But there is a problem with the perf tool: it can't trace addresses in
kernel modules. This is a problem regardless if the traced app is 32-bit or
64-bit; and regardless if I do callgraph profiling or not.
See the above trace, where the kernel addresses have all ffffffffa0* without a
symbol name.
If I look at /proc/kallsyms I can guess the symbols, for example
0xffffffffa01b4493 is probably this one:
ffffffffa01b4411 t r600_cs_packet_parse [radeon]
If I record/report without callgraph its the same problem:
[...]
24.01% glxgears [kernel] [k] 0xffffffffa01b4ee9
3.96% glxgears libdrm_radeon.so.1.0.0 [.] cs_gem_write_reloc
3.53% glxgears r600_dri.so [.] r700SendSPIState
2.77% glxgears r600_dri.so [.] r700DrawPrims
1.99% glxgears r600_dri.so [.] r700SendVSConsts
Kernel symbol for 0xffffffffa01b4ee9 is not shown, I can guess it is this one
(hey it was an exact match!):
ffffffffa01b4ee9 t r600_packet3_check [radeon]
It would be good if perf knew how to lookup symbols in kernel modules!
Best regards,
--Edwin
^ permalink raw reply [flat|nested] 22+ messages in thread* [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels. 2010-03-15 15:34 fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin @ 2010-03-15 15:34 ` Török Edwin 2010-03-16 14:49 ` Frederic Weisbecker 2010-03-15 16:23 ` Török Edwin 1 sibling, 1 reply; 22+ messages in thread From: Török Edwin @ 2010-03-15 15:34 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel, Török Edwin When profiling a 32-bit process on a 64-bit kernel, callgraph tracing stopped after the first function, because it has seen a garbage memory address (tried to interpret the frame pointer, and return address as a 64-bit pointer). Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. Note that TIF_IA32 flag must be used, and not is_compat_task(), because the latter is only set when the 32-bit process is executing a syscall, which may not always be the case (when tracing page fault events for example). Signed-off-by: Török Edwin <edwintorok@gmail.com> --- arch/x86/kernel/cpu/perf_event.c | 33 +++++++++++++++++++++++++++++++++ 1 files changed, 33 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 8c1c070..13ee83a 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame) return bytes == sizeof(*frame); } +struct stack_frame_ia32 { + u32 next_frame; + u32 return_address; +}; + +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame) +{ + unsigned long bytes; + + bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame)); + + return bytes == sizeof(*frame); +} + static void perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) { @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) callchain_store(entry, PERF_CONTEXT_USER); callchain_store(entry, regs->ip); + if (test_thread_flag(TIF_IA32)) { + /* 32-bit process in 64-bit kernel. */ + u32 fp = regs->bp; + struct stack_frame_ia32 frame; + while (entry->nr < PERF_MAX_STACK_DEPTH) { + frame.next_frame = 0; + frame.return_address = 0; + + if (!copy_stack_frame_ia32(fp, &frame)) + break; + + if ((unsigned long)fp < regs->sp) + break; + + callchain_store(entry, frame.return_address); + fp = frame.next_frame; + } + return; + } while (entry->nr < PERF_MAX_STACK_DEPTH) { frame.next_frame = NULL; -- 1.7.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels. 2010-03-15 15:34 ` [PATCH] perf: x86: " Török Edwin @ 2010-03-16 14:49 ` Frederic Weisbecker 2010-03-16 15:02 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Török Edwin 2010-03-16 15:04 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin 0 siblings, 2 replies; 22+ messages in thread From: Frederic Weisbecker @ 2010-03-16 14:49 UTC (permalink / raw) To: Török Edwin Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel On Mon, Mar 15, 2010 at 05:34:20PM +0200, Török Edwin wrote: > When profiling a 32-bit process on a 64-bit kernel, callgraph tracing > stopped after the first function, because it has seen a garbage memory address > (tried to interpret the frame pointer, and return address as a 64-bit pointer). > > Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. > > Note that TIF_IA32 flag must be used, and not is_compat_task(), because the > latter is only set when the 32-bit process is executing a syscall, > which may not always be the case (when tracing page fault events for example). > > Signed-off-by: Török Edwin <edwintorok@gmail.com> > --- > arch/x86/kernel/cpu/perf_event.c | 33 +++++++++++++++++++++++++++++++++ > 1 files changed, 33 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c > index 8c1c070..13ee83a 100644 > --- a/arch/x86/kernel/cpu/perf_event.c > +++ b/arch/x86/kernel/cpu/perf_event.c > @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame) > return bytes == sizeof(*frame); > } > > +struct stack_frame_ia32 { > + u32 next_frame; > + u32 return_address; > +}; > + > +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame) > +{ > + unsigned long bytes; > + > + bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame)); > + > + return bytes == sizeof(*frame); > +} > + > static void > perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) > { > @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) > > callchain_store(entry, PERF_CONTEXT_USER); > callchain_store(entry, regs->ip); > + if (test_thread_flag(TIF_IA32)) { > + /* 32-bit process in 64-bit kernel. */ > + u32 fp = regs->bp; > + struct stack_frame_ia32 frame; > + while (entry->nr < PERF_MAX_STACK_DEPTH) { > + frame.next_frame = 0; > + frame.return_address = 0; > + > + if (!copy_stack_frame_ia32(fp, &frame)) > + break; > + > + if ((unsigned long)fp < regs->sp) > + break; Shouldn't it be this? if (fp < (u32)regs->sp) As the high part of fp is zeroed in the cast but the high part of regs->sp remains. I don't know what could be there, but since the user doesn't use it, perhaps just garbage? And that could messup the comparison. Otherwise the rest looks pretty good to me, nice catch. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2. 2010-03-16 14:49 ` Frederic Weisbecker @ 2010-03-16 15:02 ` Török Edwin 2010-03-16 17:05 ` Ingo Molnar 2010-03-16 15:04 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin 1 sibling, 1 reply; 22+ messages in thread From: Török Edwin @ 2010-03-16 15:02 UTC (permalink / raw) To: Frederic Weisbecker Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel, Török Edwin When profiling a 32-bit process on a 64-bit kernel, callgraph tracing stopped after the first function, because it has seen a garbage memory address (tried to interpret the frame pointer, and return address as a 64-bit pointer). Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. Note that TIF_IA32 flag must be used, and not is_compat_task(), because the latter is only set when the 32-bit process is executing a syscall, which may not always be the case (when tracing page fault events for example). Signed-off-by: Török Edwin <edwintorok@gmail.com> --- arch/x86/kernel/cpu/perf_event.c | 33 +++++++++++++++++++++++++++++++++ 1 files changed, 33 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 8c1c070..b85ea9f 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame) return bytes == sizeof(*frame); } +struct stack_frame_ia32 { + u32 next_frame; + u32 return_address; +}; + +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame) +{ + unsigned long bytes; + + bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame)); + + return bytes == sizeof(*frame); +} + static void perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) { @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) callchain_store(entry, PERF_CONTEXT_USER); callchain_store(entry, regs->ip); + if (test_thread_flag(TIF_IA32)) { + /* 32-bit process in 64-bit kernel. */ + u32 fp = regs->bp; + struct stack_frame_ia32 frame; + while (entry->nr < PERF_MAX_STACK_DEPTH) { + frame.next_frame = 0; + frame.return_address = 0; + + if (!copy_stack_frame_ia32(fp, &frame)) + break; + + if (fp < (u32)regs->sp) + break; + + callchain_store(entry, frame.return_address); + fp = frame.next_frame; + } + return; + } while (entry->nr < PERF_MAX_STACK_DEPTH) { frame.next_frame = NULL; -- 1.7.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2. 2010-03-16 15:02 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Török Edwin @ 2010-03-16 17:05 ` Ingo Molnar 2010-03-17 8:48 ` Török Edwin 0 siblings, 1 reply; 22+ messages in thread From: Ingo Molnar @ 2010-03-16 17:05 UTC (permalink / raw) To: T??r??k Edwin, H. Peter Anvin Cc: Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel * T??r??k Edwin <edwintorok@gmail.com> wrote: > When profiling a 32-bit process on a 64-bit kernel, callgraph tracing > stopped after the first function, because it has seen a garbage memory address > (tried to interpret the frame pointer, and return address as a 64-bit pointer). > > Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. > > Note that TIF_IA32 flag must be used, and not is_compat_task(), because the > latter is only set when the 32-bit process is executing a syscall, > which may not always be the case (when tracing page fault events for example). > > Signed-off-by: T??r??k Edwin <edwintorok@gmail.com> > --- > arch/x86/kernel/cpu/perf_event.c | 33 +++++++++++++++++++++++++++++++++ > 1 files changed, 33 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c > index 8c1c070..b85ea9f 100644 > --- a/arch/x86/kernel/cpu/perf_event.c > +++ b/arch/x86/kernel/cpu/perf_event.c > @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame) > return bytes == sizeof(*frame); > } > > +struct stack_frame_ia32 { > + u32 next_frame; > + u32 return_address; > +}; Please put such new data type definitions not into the middle of a .c file but next to where struct stack_frame is defined. > + > +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame) > +{ > + unsigned long bytes; > + > + bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame)); > + > + return bytes == sizeof(*frame); > +} > Single-use - should be inline i guess. > + > static void > perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) > { > @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) > > callchain_store(entry, PERF_CONTEXT_USER); > callchain_store(entry, regs->ip); > + if (test_thread_flag(TIF_IA32)) { > + /* 32-bit process in 64-bit kernel. */ > + u32 fp = regs->bp; > + struct stack_frame_ia32 frame; > + while (entry->nr < PERF_MAX_STACK_DEPTH) { Please put newlines after local variable definition so that they are clearly delimited. Also, the tabulation is weird - please run it through scripts/checkpatch.pl. > + frame.next_frame = 0; > + frame.return_address = 0; > + > + if (!copy_stack_frame_ia32(fp, &frame)) > + break; > + > + if (fp < (u32)regs->sp) > + break; > + > + callchain_store(entry, frame.return_address); > + fp = frame.next_frame; > + } > + return; This whole new block should probably be in a helper inline? Also, it should probably be #ifdef CONFIG_COMPAT or so. Thanks, Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2. 2010-03-16 17:05 ` Ingo Molnar @ 2010-03-17 8:48 ` Török Edwin 2010-03-17 8:49 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin 2010-03-17 9:59 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Ingo Molnar 0 siblings, 2 replies; 22+ messages in thread From: Török Edwin @ 2010-03-17 8:48 UTC (permalink / raw) To: Ingo Molnar Cc: H. Peter Anvin, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Paul Mackerras, x86, linux-kernel On 03/16/2010 07:05 PM, Ingo Molnar wrote: > * T??r??k Edwin <edwintorok@gmail.com> wrote: > >> When profiling a 32-bit process on a 64-bit kernel, callgraph tracing >> stopped after the first function, because it has seen a garbage memory address >> (tried to interpret the frame pointer, and return address as a 64-bit pointer). >> >> Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. >> >> Note that TIF_IA32 flag must be used, and not is_compat_task(), because the >> latter is only set when the 32-bit process is executing a syscall, >> which may not always be the case (when tracing page fault events for example). >> >> Signed-off-by: T??r??k Edwin <edwintorok@gmail.com> >> --- >> arch/x86/kernel/cpu/perf_event.c | 33 +++++++++++++++++++++++++++++++++ >> 1 files changed, 33 insertions(+), 0 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c >> index 8c1c070..b85ea9f 100644 >> --- a/arch/x86/kernel/cpu/perf_event.c >> +++ b/arch/x86/kernel/cpu/perf_event.c >> @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame) >> return bytes == sizeof(*frame); >> } >> >> +struct stack_frame_ia32 { >> + u32 next_frame; >> + u32 return_address; >> +}; > > Please put such new data type definitions not into the middle of a .c file but > next to where struct stack_frame is defined. Ok. > >> + >> +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame) >> +{ >> + unsigned long bytes; >> + >> + bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame)); >> + >> + return bytes == sizeof(*frame); >> +} >> > > Single-use - should be inline i guess. > So should be copy_stack_frame() then. >> + >> static void >> perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) >> { >> @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) >> >> callchain_store(entry, PERF_CONTEXT_USER); >> callchain_store(entry, regs->ip); >> + if (test_thread_flag(TIF_IA32)) { >> + /* 32-bit process in 64-bit kernel. */ >> + u32 fp = regs->bp; >> + struct stack_frame_ia32 frame; >> + while (entry->nr < PERF_MAX_STACK_DEPTH) { > > Please put newlines after local variable definition so that they are clearly > delimited. > > Also, the tabulation is weird - please run it through scripts/checkpatch.pl. I forgot to change shiftwidth to 8 (I usually use 4). I cleaned the checkpatch warnings now. > >> + frame.next_frame = 0; >> + frame.return_address = 0; >> + >> + if (!copy_stack_frame_ia32(fp, &frame)) >> + break; >> + >> + if (fp < (u32)regs->sp) >> + break; >> + >> + callchain_store(entry, frame.return_address); >> + fp = frame.next_frame; >> + } >> + return; > > This whole new block should probably be in a helper inline? To reduce indenting, or why? > > Also, it should probably be #ifdef CONFIG_COMPAT or so. Ok, see V3 of my patch. Best regards, --Edwin ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3. 2010-03-17 8:48 ` Török Edwin @ 2010-03-17 8:49 ` Török Edwin 2010-03-17 9:54 ` Ingo Molnar 2010-03-17 10:07 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4 Török Edwin 2010-03-17 9:59 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Ingo Molnar 1 sibling, 2 replies; 22+ messages in thread From: Török Edwin @ 2010-03-17 8:49 UTC (permalink / raw) To: Ingo Molnar Cc: Török Edwin, H. Peter Anvin, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Paul Mackerras, x86, linux-kernel When profiling a 32-bit process on a 64-bit kernel, callgraph tracing stopped after the first function, because it has seen a garbage memory address (tried to interpret the frame pointer, and return address as a 64-bit pointer). Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. Note that TIF_IA32 flag must be used, and not is_compat_task(), because the latter is only set when the 32-bit process is executing a syscall, which may not always be the case (when tracing page fault events for example). Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Török Edwin <edwintorok@gmail.com> --- arch/x86/kernel/cpu/perf_event.c | 38 +++++++++++++++++++++++++++++++++----- arch/x86/kernel/dumpstack.h | 5 +++++ 2 files changed, 38 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 8c1c070..51f72f7 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -26,6 +26,7 @@ #include <asm/apic.h> #include <asm/stacktrace.h> #include <asm/nmi.h> +#include <asm/compat.h> static u64 perf_event_mask __read_mostly; @@ -2392,14 +2393,32 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n) return len; } -static int copy_stack_frame(const void __user *fp, struct stack_frame *frame) +#ifdef CONFIG_COMPAT + +static inline void +perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry) { - unsigned long bytes; + /* 32-bit process in 64-bit kernel. */ + struct stack_frame_ia32 frame; + const void __user *fp = compat_ptr(regs->bp); + + while (entry->nr < PERF_MAX_STACK_DEPTH) { + unsigned long bytes; + frame.next_frame = 0; + frame.return_address = 0; + + bytes = copy_from_user_nmi(&frame, fp, sizeof(frame)); + if (bytes != sizeof(frame)) + break; - bytes = copy_from_user_nmi(frame, fp, sizeof(*frame)); + if (fp < compat_ptr(regs->sp)) + break; - return bytes == sizeof(*frame); + callchain_store(entry, frame.return_address); + fp = compat_ptr(frame.next_frame); + } } +#endif static void perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) @@ -2415,11 +2434,20 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) callchain_store(entry, PERF_CONTEXT_USER); callchain_store(entry, regs->ip); +#ifdef CONFIG_COMPAT + if (test_thread_flag(TIF_IA32)) { + perf_callchain_user32(regs, entry); + return; + } +#endif + while (entry->nr < PERF_MAX_STACK_DEPTH) { + unsigned long bytes; frame.next_frame = NULL; frame.return_address = 0; - if (!copy_stack_frame(fp, &frame)) + bytes = copy_from_user_nmi(&frame, fp, sizeof(frame)); + if (bytes != sizeof(frame)) break; if ((unsigned long)fp < regs->sp) diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h index 4fd1420..3f0a242 100644 --- a/arch/x86/kernel/dumpstack.h +++ b/arch/x86/kernel/dumpstack.h @@ -29,4 +29,9 @@ struct stack_frame { struct stack_frame *next_frame; unsigned long return_address; }; + +struct stack_frame_ia32 { + u32 next_frame; + u32 return_address; +}; #endif -- 1.7.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3. 2010-03-17 8:49 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin @ 2010-03-17 9:54 ` Ingo Molnar 2010-03-17 10:07 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4 Török Edwin 1 sibling, 0 replies; 22+ messages in thread From: Ingo Molnar @ 2010-03-17 9:54 UTC (permalink / raw) To: T??r??k Edwin Cc: H. Peter Anvin, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Paul Mackerras, x86, linux-kernel * T??r??k Edwin <edwintorok@gmail.com> wrote: > +#ifdef CONFIG_COMPAT > + if (test_thread_flag(TIF_IA32)) { > + perf_callchain_user32(regs, entry); > + return; > + } > +#endif You can push that test into perf_callchain_user32() and turn it into something like: if (perf_callchain_user32(regs, entry)) return; without the #ifdef here. Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4. 2010-03-17 8:49 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin 2010-03-17 9:54 ` Ingo Molnar @ 2010-03-17 10:07 ` Török Edwin 2010-03-30 23:18 ` Frederic Weisbecker 1 sibling, 1 reply; 22+ messages in thread From: Török Edwin @ 2010-03-17 10:07 UTC (permalink / raw) To: Ingo Molnar Cc: Török Edwin, H. Peter Anvin, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Paul Mackerras, x86, linux-kernel When profiling a 32-bit process on a 64-bit kernel, callgraph tracing stopped after the first function, because it has seen a garbage memory address (tried to interpret the frame pointer, and return address as a 64-bit pointer). Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. Note that TIF_IA32 flag must be used, and not is_compat_task(), because the latter is only set when the 32-bit process is executing a syscall, which may not always be the case (when tracing page fault events for example). Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Török Edwin <edwintorok@gmail.com> --- arch/x86/kernel/cpu/perf_event.c | 44 +++++++++++++++++++++++++++++++++---- arch/x86/kernel/dumpstack.h | 5 ++++ 2 files changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 8c1c070..5f97f29 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -26,6 +26,7 @@ #include <asm/apic.h> #include <asm/stacktrace.h> #include <asm/nmi.h> +#include <asm/compat.h> static u64 perf_event_mask __read_mostly; @@ -2392,14 +2393,42 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n) return len; } -static int copy_stack_frame(const void __user *fp, struct stack_frame *frame) +#ifdef CONFIG_COMPAT +static inline int +perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry) { - unsigned long bytes; + /* 32-bit process in 64-bit kernel. */ + struct stack_frame_ia32 frame; + const void __user *fp; + + if (!test_thread_flag(TIF_IA32)) + return 0; + + fp = compat_ptr(regs->bp); + while (entry->nr < PERF_MAX_STACK_DEPTH) { + unsigned long bytes; + frame.next_frame = 0; + frame.return_address = 0; - bytes = copy_from_user_nmi(frame, fp, sizeof(*frame)); + bytes = copy_from_user_nmi(&frame, fp, sizeof(frame)); + if (bytes != sizeof(frame)) + break; + + if (fp < compat_ptr(regs->sp)) + break; - return bytes == sizeof(*frame); + callchain_store(entry, frame.return_address); + fp = compat_ptr(frame.next_frame); + } + return 1; } +#else +static inline int +perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry) +{ + return 0; +} +#endif static void perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) @@ -2415,11 +2444,16 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) callchain_store(entry, PERF_CONTEXT_USER); callchain_store(entry, regs->ip); + if (perf_callchain_user32(regs, entry)) + return; + while (entry->nr < PERF_MAX_STACK_DEPTH) { + unsigned long bytes; frame.next_frame = NULL; frame.return_address = 0; - if (!copy_stack_frame(fp, &frame)) + bytes = copy_from_user_nmi(&frame, fp, sizeof(frame)); + if (bytes != sizeof(frame)) break; if ((unsigned long)fp < regs->sp) diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h index 4fd1420..3f0a242 100644 --- a/arch/x86/kernel/dumpstack.h +++ b/arch/x86/kernel/dumpstack.h @@ -29,4 +29,9 @@ struct stack_frame { struct stack_frame *next_frame; unsigned long return_address; }; + +struct stack_frame_ia32 { + u32 next_frame; + u32 return_address; +}; #endif -- 1.7.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4. 2010-03-17 10:07 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4 Török Edwin @ 2010-03-30 23:18 ` Frederic Weisbecker 0 siblings, 0 replies; 22+ messages in thread From: Frederic Weisbecker @ 2010-03-30 23:18 UTC (permalink / raw) To: Török Edwin Cc: Ingo Molnar, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Paul Mackerras, x86, linux-kernel On Wed, Mar 17, 2010 at 12:07:16PM +0200, Török Edwin wrote: > When profiling a 32-bit process on a 64-bit kernel, callgraph tracing > stopped after the first function, because it has seen a garbage memory address > (tried to interpret the frame pointer, and return address as a 64-bit pointer). > > Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. > > Note that TIF_IA32 flag must be used, and not is_compat_task(), because the > latter is only set when the 32-bit process is executing a syscall, > which may not always be the case (when tracing page fault events for example). > > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Paul Mackerras <paulus@samba.org> > Cc: x86@kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Török Edwin <edwintorok@gmail.com> Queued, thanks! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2. 2010-03-17 8:48 ` Török Edwin 2010-03-17 8:49 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin @ 2010-03-17 9:59 ` Ingo Molnar 1 sibling, 0 replies; 22+ messages in thread From: Ingo Molnar @ 2010-03-17 9:59 UTC (permalink / raw) To: T?r?k Edwin Cc: H. Peter Anvin, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Paul Mackerras, x86, linux-kernel * T?r?k Edwin <edwintorok@gmail.com> wrote: > >> + } > >> + return; > > > > This whole new block should probably be in a helper inline? > > To reduce indenting, or why? Mainly to increase cleanliness. We want code that isolates separate blocks of logic into separate sections. So a simple construct of: if (perf_callchain_user32(regs, entry)) return; will tell the reader of the code that 'ok, this is 32-bit compat stuff'. It doesnt, at this level, intrude into the main logic of that function. Then if we look at perf_callchain_user32() we see all the 32-bit compat stack frame walking details. Thanks, Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels. 2010-03-16 14:49 ` Frederic Weisbecker 2010-03-16 15:02 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Török Edwin @ 2010-03-16 15:04 ` Török Edwin 1 sibling, 0 replies; 22+ messages in thread From: Török Edwin @ 2010-03-16 15:04 UTC (permalink / raw) To: Frederic Weisbecker Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel On 03/16/2010 04:49 PM, Frederic Weisbecker wrote: > On Mon, Mar 15, 2010 at 05:34:20PM +0200, Török Edwin wrote: >> When profiling a 32-bit process on a 64-bit kernel, callgraph tracing >> stopped after the first function, because it has seen a garbage memory address >> (tried to interpret the frame pointer, and return address as a 64-bit pointer). >> >> Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. >> >> Note that TIF_IA32 flag must be used, and not is_compat_task(), because the >> latter is only set when the 32-bit process is executing a syscall, >> which may not always be the case (when tracing page fault events for example). >> >> Signed-off-by: Török Edwin <edwintorok@gmail.com> >> --- >> arch/x86/kernel/cpu/perf_event.c | 33 +++++++++++++++++++++++++++++++++ >> 1 files changed, 33 insertions(+), 0 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c >> index 8c1c070..13ee83a 100644 >> --- a/arch/x86/kernel/cpu/perf_event.c >> +++ b/arch/x86/kernel/cpu/perf_event.c >> @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame) >> return bytes == sizeof(*frame); >> } >> >> +struct stack_frame_ia32 { >> + u32 next_frame; >> + u32 return_address; >> +}; >> + >> +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame) >> +{ >> + unsigned long bytes; >> + >> + bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame)); >> + >> + return bytes == sizeof(*frame); >> +} >> + >> static void >> perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) >> { >> @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry) >> >> callchain_store(entry, PERF_CONTEXT_USER); >> callchain_store(entry, regs->ip); >> + if (test_thread_flag(TIF_IA32)) { >> + /* 32-bit process in 64-bit kernel. */ >> + u32 fp = regs->bp; >> + struct stack_frame_ia32 frame; >> + while (entry->nr < PERF_MAX_STACK_DEPTH) { >> + frame.next_frame = 0; >> + frame.return_address = 0; >> + >> + if (!copy_stack_frame_ia32(fp, &frame)) >> + break; >> + >> + if ((unsigned long)fp < regs->sp) >> + break; > > > > Shouldn't it be this? > > if (fp < (u32)regs->sp) > > As the high part of fp is zeroed in the cast but > the high part of regs->sp remains. I don't know what could be > there, but since the user doesn't use it, perhaps just garbage? > And that could messup the comparison. Agreed. I sent a new patch with that change. > > Otherwise the rest looks pretty good to me, nice catch. > Thanks. Best regards, --Edwin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: fix callgraphs of 32-bit processes on 64-bit kernels. 2010-03-15 15:34 fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin 2010-03-15 15:34 ` [PATCH] perf: x86: " Török Edwin @ 2010-03-15 16:23 ` Török Edwin 2010-03-16 8:18 ` Török Edwin 1 sibling, 1 reply; 22+ messages in thread From: Török Edwin @ 2010-03-15 16:23 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel On 03/15/2010 05:34 PM, Török Edwin wrote: > Hi, > > Callgraph profiling 32-bit apps on a 64-bit kernel doesn't work. > The reason is that perf_callchain_user tries to read a stackframe with 64-bit > pointers, which is wrong for a 32-bit process. > > This patch fixes that, and I am almost able to get nice callgraph profiles > from 32-bit apps now! (except for some problems with perf itself when tracing > kernel modules, see [1]) > > Page-faults can be traced nicely (sid-ia32 is a 32-bit chroot): > > $ sudo perf record -e page-faults -f -g /home/edwin/sid-ia32/usr/bin/glxgears > $ sudo perf report > ... > 45.33% libc-2.10.2.so [.] __GI_memcpy > | > --- __GI_memcpy > _mesa_BufferDataARB > _mesa_meta_Clear > radeonUserClear > r700Clear > _mesa_Clear > 0x8049367 > 0x804a6ba > __libc_start_main > 0x8049111 > > 16.96% libc-2.10.2.so [.] __GI_memset > | > --- __GI_memset > _tnl_init_vertices > _swsetup_CreateContext > r600CreateContext > driCreateNewContext > dri2CreateNewContext > 0xf77ab7dd > 0xf7783c67 > 0xf778514c > 0x804974f > 0x804a33d > __libc_start_main > 0x8049111 > > And CPU cycles can be traced too in userspace: > $ sudo perf record -f -g /home/edwin/sid-ia32/usr/bin/glxgears > $ sudo perf report --sort comm,dso > [...] > 44.97% glxgears r600_dri.so > | > |--5.85%-- r700SendSPIState > | radeonEmitState > | r700DrawPrims > | | > | |--95.45%-- vbo_save_playback_vertex_list > | | execute_list > | | _mesa_CallList > | | neutral_CallList > | | | > | | |--38.10%-- 0x80494a8 > | | | 0x804a6ba > | | | __libc_start_main > | | | 0x8049111 > [....] > 40.00% glxgears [kernel] > | > |--3.14%-- copy_user_generic_string > | | > | |--71.70%-- 0xffffffffa01b4493 > | | 0xffffffffa01b7c0b > | | 0xffffffffa018b45b > | | 0xffffffffa00ca927 > | | 0xffffffffa01c524e > | | compat_sys_ioctl > | | sysenter_dispatch > | | 0xf77ca430 > | | drmCommandWriteRead > | | 0xf74d7ab5 > | | 0xf74d89a4 > | | rcommonFlushCmdBufLocked > | | rcommonFlushCmdBuf > | | radeonFlush > | | _mesa_flush > | | _mesa_Flush > | | 0xf775f270 > | | 0x804a6d5 > | | __libc_start_main > | | 0x8049111 > | | > | |--15.09%-- 0xffffffffa01c524e > | | compat_sys_ioctl > | | sysenter_dispatch > | | 0xf77ca430 > | | drmCommandWriteRead > > [1] But there is a problem with the perf tool: it can't trace addresses in > kernel modules. This is a problem regardless if the traced app is 32-bit or > 64-bit; and regardless if I do callgraph profiling or not. > See the above trace, where the kernel addresses have all ffffffffa0* without a > symbol name. > > If I look at /proc/kallsyms I can guess the symbols, for example > 0xffffffffa01b4493 is probably this one: > ffffffffa01b4411 t r600_cs_packet_parse [radeon] > > If I record/report without callgraph its the same problem: > [...] > 24.01% glxgears [kernel] [k] 0xffffffffa01b4ee9 > 3.96% glxgears libdrm_radeon.so.1.0.0 [.] cs_gem_write_reloc > 3.53% glxgears r600_dri.so [.] r700SendSPIState > 2.77% glxgears r600_dri.so [.] r700DrawPrims > 1.99% glxgears r600_dri.so [.] r700SendVSConsts > > Kernel symbol for 0xffffffffa01b4ee9 is not shown, I can guess it is this one > (hey it was an exact match!): > ffffffffa01b4ee9 t r600_packet3_check [radeon] > > It would be good if perf knew how to lookup symbols in kernel modules! BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show the symbols either. > > Best regards, > --Edwin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: fix callgraphs of 32-bit processes on 64-bit kernels. 2010-03-15 16:23 ` Török Edwin @ 2010-03-16 8:18 ` Török Edwin 2010-03-16 8:47 ` Ingo Molnar 2010-03-16 9:57 ` [PATCH] perf: install into /usr/local by default Török Edwin 0 siblings, 2 replies; 22+ messages in thread From: Török Edwin @ 2010-03-16 8:18 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel On 03/15/2010 06:23 PM, Török Edwin wrote: > On 03/15/2010 05:34 PM, Török Edwin wrote: >> >> It would be good if perf knew how to lookup symbols in kernel modules! > > BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show > the symbols either. I always forget that, unlike every other program, perf doesn't install by default to /usr/local! So I was running the wrong version of perf (from an older kernel), since perf was installed to $HOME/bin (which of course isn't in sudo's path). Sorry for the confusion, the 2.6.33 perf DOES know how to lookup the symbols: 9.92% glxgears [radeon] [k] r600_packet3_check | --- r600_packet3_check | |--96.80%-- r600_cs_parse | radeon_cs_ioctl | drm_ioctl | radeon_kms_compat_ioctl | compat_sys_ioctl | sysenter_dispatch | 0xf7759430 | drmCommandWriteRead | cs_gem_emit | radeon_cs_emit | rcommonFlushCmdBufLocked | rcommonFlushCmdBuf | radeonFlush | _mesa_flush | _mesa_Flush | 0xf76ee270 | 0x804a6d5 | __libc_start_main | 0x8049111 [...] Best regards, --Edwin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: fix callgraphs of 32-bit processes on 64-bit kernels. 2010-03-16 8:18 ` Török Edwin @ 2010-03-16 8:47 ` Ingo Molnar 2010-03-16 10:17 ` Török Edwin 2010-03-16 9:57 ` [PATCH] perf: install into /usr/local by default Török Edwin 1 sibling, 1 reply; 22+ messages in thread From: Ingo Molnar @ 2010-03-16 8:47 UTC (permalink / raw) To: T??r??k Edwin Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo * T??r??k Edwin <edwintorok@gmail.com> wrote: > On 03/15/2010 06:23 PM, T??r??k Edwin wrote: > > On 03/15/2010 05:34 PM, T??r??k Edwin wrote: > >> > >> It would be good if perf knew how to lookup symbols in kernel modules! > > > > BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show > > the symbols either. > > I always forget that, unlike every other program, perf doesn't install > by default to /usr/local! > So I was running the wrong version of perf (from an older kernel), since > perf was installed to $HOME/bin (which of course isn't in sudo's path). > > Sorry for the confusion, the 2.6.33 perf DOES know how to lookup the > symbols: > 9.92% glxgears [radeon] [k] > r600_packet3_check > | > --- r600_packet3_check > | > |--96.80%-- r600_cs_parse Ok, great! I suspect we could install into /usr/local too. Do you want to send a patch for that? Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: fix callgraphs of 32-bit processes on 64-bit kernels. 2010-03-16 8:47 ` Ingo Molnar @ 2010-03-16 10:17 ` Török Edwin 2010-03-16 10:23 ` Ingo Molnar 0 siblings, 1 reply; 22+ messages in thread From: Török Edwin @ 2010-03-16 10:17 UTC (permalink / raw) To: Ingo Molnar Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo On 03/16/2010 10:47 AM, Ingo Molnar wrote: > * T??r??k Edwin <edwintorok@gmail.com> wrote: > >> On 03/15/2010 06:23 PM, T??r??k Edwin wrote: >>> On 03/15/2010 05:34 PM, T??r??k Edwin wrote: >>>> It would be good if perf knew how to lookup symbols in kernel modules! >>> BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show >>> the symbols either. >> I always forget that, unlike every other program, perf doesn't install >> by default to /usr/local! >> So I was running the wrong version of perf (from an older kernel), since >> perf was installed to $HOME/bin (which of course isn't in sudo's path). >> >> Sorry for the confusion, the 2.6.33 perf DOES know how to lookup the >> symbols: >> 9.92% glxgears [radeon] [k] >> r600_packet3_check >> | >> --- r600_packet3_check >> | >> |--96.80%-- r600_cs_parse > > Ok, great! BTW the patch I sent yesterday for tracing 32-bit apps is still needed, since that is a kernel patch, and it wasn't due to using the wrong perf. > > I suspect we could install into /usr/local too. Do you want to send a patch > for that? Sent. BTW I think perf would need some documentation on how to install, and what packages you need to build everything, what permissions it needs to run, etc. 1. manpages For example by default the manpages don't get built and installed, so perf report --help doesn't work. It needs a 'make man', and 'make install-man'. This is fine, because they need asciidoc and xmlto which aren't usually installed on every system. But there should be some documentation mentioning this. 2. privileges I just found out that perf works without root privileges (I just assumed it needed root, since oprofile needs it). 3. non-working targets? Also there are some targets in Documentation that can't be built due to missing files, like pdf which needs a non-existent user-manual.xml. 4. unresolved symbols Sometimes I get symbol addresses that are not resolved, like this: 57.03% :32216 7fc7dc0acfa6 [.] 0x007fc7dc0acfa6 12.39% :32216 [radeon] [k] r600_packet3_check 4.92% :32216 [radeon] [k] r600_cs_packet_parse 2.70% :32216 [radeon] [k] r600_cs_parse Is this due to ASLR? Does perf need ASLR disabled? That address corresponds to this: 7fc7dc07e000-7fc7dc2ef000 r-xp 00000000 fd:03 27713 /opt/xorg/lib/dri/r600_dri.so It'd of course be nice if there was a distro package for perf, I think I'll file a RFP in Debian for that. Best regards, --Edwin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: fix callgraphs of 32-bit processes on 64-bit kernels. 2010-03-16 10:17 ` Török Edwin @ 2010-03-16 10:23 ` Ingo Molnar 0 siblings, 0 replies; 22+ messages in thread From: Ingo Molnar @ 2010-03-16 10:23 UTC (permalink / raw) To: T?r?k Edwin Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras, x86, linux-kernel, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo * T?r?k Edwin <edwintorok@gmail.com> wrote: > On 03/16/2010 10:47 AM, Ingo Molnar wrote: > > * T??r??k Edwin <edwintorok@gmail.com> wrote: > > > >> On 03/15/2010 06:23 PM, T??r??k Edwin wrote: > >>> On 03/15/2010 05:34 PM, T??r??k Edwin wrote: > >>>> It would be good if perf knew how to lookup symbols in kernel modules! > >>> BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show > >>> the symbols either. > >> I always forget that, unlike every other program, perf doesn't install > >> by default to /usr/local! > >> So I was running the wrong version of perf (from an older kernel), since > >> perf was installed to $HOME/bin (which of course isn't in sudo's path). > >> > >> Sorry for the confusion, the 2.6.33 perf DOES know how to lookup the > >> symbols: > >> 9.92% glxgears [radeon] [k] > >> r600_packet3_check > >> | > >> --- r600_packet3_check > >> | > >> |--96.80%-- r600_cs_parse > > > > Ok, great! > > BTW the patch I sent yesterday for tracing 32-bit apps is still needed, > since that is a kernel patch, and it wasn't due to using the wrong perf. I've Cc:-ed Frederic for that bug. (Frederic has written a good deal of that code) > > I suspect we could install into /usr/local too. Do you want to send a patch > > for that? > > Sent. > > BTW I think perf would need some documentation on how to install, and what > packages you need to build everything, what permissions it needs to run, > etc. Agreed. (I've Cc:-ed Arnaldo who has a pending fix in this area.) Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] perf: install into /usr/local by default. 2010-03-16 8:18 ` Török Edwin 2010-03-16 8:47 ` Ingo Molnar @ 2010-03-16 9:57 ` Török Edwin 2010-03-16 10:10 ` Ingo Molnar 1 sibling, 1 reply; 22+ messages in thread From: Török Edwin @ 2010-03-16 9:57 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Paul Mackerras, linux-kernel, Török Edwin It was confusing to install into $(HOME)/bin, especially since there was no documentation mentioning where perf gets installed by default. So install to /usr/local by default, as other programs do, and allow users to override the install location by specifying the prefix explicitly. Signed-off-by: Török Edwin <edwintorok@gmail.com> --- tools/perf/Makefile | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 2e7fa3a..8e8c199 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -216,7 +216,7 @@ STRIP ?= strip # runtime figures out where they are based on the path to the executable. # This can help installing the suite in a relocatable way. -prefix = $(HOME) +prefix = /usr/local bindir_relative = bin bindir = $(prefix)/$(bindir_relative) mandir = share/man -- 1.7.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: install into /usr/local by default. 2010-03-16 9:57 ` [PATCH] perf: install into /usr/local by default Török Edwin @ 2010-03-16 10:10 ` Ingo Molnar 2010-03-16 10:20 ` Avi Kivity 2010-03-16 10:24 ` Török Edwin 0 siblings, 2 replies; 22+ messages in thread From: Ingo Molnar @ 2010-03-16 10:10 UTC (permalink / raw) To: T??r??k Edwin Cc: Peter Zijlstra, Paul Mackerras, linux-kernel, =?unknown-8bit?B?RnLDqWTDqXJpYw==?= Weisbecker, Arnaldo Carvalho de Melo * T??r??k Edwin <edwintorok@gmail.com> wrote: > It was confusing to install into $(HOME)/bin, especially since there was > no documentation mentioning where perf gets installed by default. > So install to /usr/local by default, as other programs do, and allow users to > override the install location by specifying the prefix explicitly. > > Signed-off-by: T??r??k Edwin <edwintorok@gmail.com> > --- > tools/perf/Makefile | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/tools/perf/Makefile b/tools/perf/Makefile > index 2e7fa3a..8e8c199 100644 > --- a/tools/perf/Makefile > +++ b/tools/perf/Makefile > @@ -216,7 +216,7 @@ STRIP ?= strip > # runtime figures out where they are based on the path to the executable. > # This can help installing the suite in a relocatable way. > > -prefix = $(HOME) > +prefix = /usr/local > bindir_relative = bin > bindir = $(prefix)/$(bindir_relative) > mandir = share/man Btw., we inherited that default prefix from the Git project. Is there a way to get it into ~/bin/ if the user does not have permission to /usr/local ? (i.e. doesnt run it as root) That's a really convenient aspect of doing a 'make install' as user. (Which i tend to do in most cases) Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: install into /usr/local by default. 2010-03-16 10:10 ` Ingo Molnar @ 2010-03-16 10:20 ` Avi Kivity 2010-03-16 10:25 ` Ingo Molnar 2010-03-16 10:24 ` Török Edwin 1 sibling, 1 reply; 22+ messages in thread From: Avi Kivity @ 2010-03-16 10:20 UTC (permalink / raw) To: Ingo Molnar Cc: T??r??k Edwin, Peter Zijlstra, Paul Mackerras, linux-kernel, Frédéric Weisbecker, Arnaldo Carvalho de Melo On 03/16/2010 12:10 PM, Ingo Molnar wrote: > * T??r??k Edwin<edwintorok@gmail.com> wrote: > > >> It was confusing to install into $(HOME)/bin, especially since there was >> no documentation mentioning where perf gets installed by default. >> So install to /usr/local by default, as other programs do, and allow users to >> override the install location by specifying the prefix explicitly. >> >> Signed-off-by: T??r??k Edwin<edwintorok@gmail.com> >> --- >> tools/perf/Makefile | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/tools/perf/Makefile b/tools/perf/Makefile >> index 2e7fa3a..8e8c199 100644 >> --- a/tools/perf/Makefile >> +++ b/tools/perf/Makefile >> @@ -216,7 +216,7 @@ STRIP ?= strip >> # runtime figures out where they are based on the path to the executable. >> # This can help installing the suite in a relocatable way. >> >> -prefix = $(HOME) >> +prefix = /usr/local >> bindir_relative = bin >> bindir = $(prefix)/$(bindir_relative) >> mandir = share/man >> > Btw., we inherited that default prefix from the Git project. > > Is there a way to get it into ~/bin/ if the user does not have permission to > /usr/local ? (i.e. doesnt run it as root) > > That's a really convenient aspect of doing a 'make install' as user. (Which i > tend to do in most cases) > What about people (like me) who do 'make && sudo make install'? Can we make it position independent and derive the path from /proc/$$/exe? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: install into /usr/local by default. 2010-03-16 10:20 ` Avi Kivity @ 2010-03-16 10:25 ` Ingo Molnar 0 siblings, 0 replies; 22+ messages in thread From: Ingo Molnar @ 2010-03-16 10:25 UTC (permalink / raw) To: Avi Kivity Cc: T??r??k Edwin, Peter Zijlstra, Paul Mackerras, linux-kernel, Fr?d?ric Weisbecker, Arnaldo Carvalho de Melo * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 12:10 PM, Ingo Molnar wrote: > >* T??r??k Edwin<edwintorok@gmail.com> wrote: > > > >>It was confusing to install into $(HOME)/bin, especially since there was > >>no documentation mentioning where perf gets installed by default. > >>So install to /usr/local by default, as other programs do, and allow users to > >>override the install location by specifying the prefix explicitly. > >> > >>Signed-off-by: T??r??k Edwin<edwintorok@gmail.com> > >>--- > >> tools/perf/Makefile | 2 +- > >> 1 files changed, 1 insertions(+), 1 deletions(-) > >> > >>diff --git a/tools/perf/Makefile b/tools/perf/Makefile > >>index 2e7fa3a..8e8c199 100644 > >>--- a/tools/perf/Makefile > >>+++ b/tools/perf/Makefile > >>@@ -216,7 +216,7 @@ STRIP ?= strip > >> # runtime figures out where they are based on the path to the executable. > >> # This can help installing the suite in a relocatable way. > >> > >>-prefix = $(HOME) > >>+prefix = /usr/local > >> bindir_relative = bin > >> bindir = $(prefix)/$(bindir_relative) > >> mandir = share/man > >Btw., we inherited that default prefix from the Git project. > > > >Is there a way to get it into ~/bin/ if the user does not have permission to > >/usr/local ? (i.e. doesnt run it as root) > > > >That's a really convenient aspect of doing a 'make install' as user. (Which i > >tend to do in most cases) > > What about people (like me) who do 'make && sudo make install'? I'd like everyone to be happy :-) In case of irreconcilable differences i prefer the creation of two parallel universes, one for each variant. This does not seem to be such a case though: > Can we make it position independent and derive the path from /proc/$$/exe? Sounds like a useful approach ... Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] perf: install into /usr/local by default. 2010-03-16 10:10 ` Ingo Molnar 2010-03-16 10:20 ` Avi Kivity @ 2010-03-16 10:24 ` Török Edwin 1 sibling, 0 replies; 22+ messages in thread From: Török Edwin @ 2010-03-16 10:24 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Paul Mackerras, linux-kernel, Frédéric Weisbecker, Arnaldo Carvalho de Melo, Avi Kivity On 03/16/2010 12:10 PM, Ingo Molnar wrote: > * T??r??k Edwin <edwintorok@gmail.com> wrote: > >> It was confusing to install into $(HOME)/bin, especially since there was >> no documentation mentioning where perf gets installed by default. >> So install to /usr/local by default, as other programs do, and allow users to >> override the install location by specifying the prefix explicitly. >> >> Signed-off-by: T??r??k Edwin <edwintorok@gmail.com> >> --- >> tools/perf/Makefile | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/tools/perf/Makefile b/tools/perf/Makefile >> index 2e7fa3a..8e8c199 100644 >> --- a/tools/perf/Makefile >> +++ b/tools/perf/Makefile >> @@ -216,7 +216,7 @@ STRIP ?= strip >> # runtime figures out where they are based on the path to the executable. >> # This can help installing the suite in a relocatable way. >> >> -prefix = $(HOME) >> +prefix = /usr/local >> bindir_relative = bin >> bindir = $(prefix)/$(bindir_relative) >> mandir = share/man > > Btw., we inherited that default prefix from the Git project. > > Is there a way to get it into ~/bin/ if the user does not have permission to > /usr/local ? (i.e. doesnt run it as root) That is complicated, I usually run make as a normal user, and only do make install as root (or sudo make install). I do that for the kernel itself, and usually every program I build (I don't like compiling as root). > > That's a really convenient aspect of doing a 'make install' as user. (Which i > tend to do in most cases) On 03/16/2010 12:20 PM, Avi Kivity wrote: > What about people (like me) who do 'make && sudo make install'? > > Can we make it position independent and derive the path from /proc/$$/exe? > There is a RUNTIME_PREFIX define (undocumented...) that seems to do something like that. Are there any security implications of using that by default? Best regards, --Edwin ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-03-31 0:07 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-15 15:34 fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin 2010-03-15 15:34 ` [PATCH] perf: x86: " Török Edwin 2010-03-16 14:49 ` Frederic Weisbecker 2010-03-16 15:02 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Török Edwin 2010-03-16 17:05 ` Ingo Molnar 2010-03-17 8:48 ` Török Edwin 2010-03-17 8:49 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin 2010-03-17 9:54 ` Ingo Molnar 2010-03-17 10:07 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4 Török Edwin 2010-03-30 23:18 ` Frederic Weisbecker 2010-03-17 9:59 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Ingo Molnar 2010-03-16 15:04 ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin 2010-03-15 16:23 ` Török Edwin 2010-03-16 8:18 ` Török Edwin 2010-03-16 8:47 ` Ingo Molnar 2010-03-16 10:17 ` Török Edwin 2010-03-16 10:23 ` Ingo Molnar 2010-03-16 9:57 ` [PATCH] perf: install into /usr/local by default Török Edwin 2010-03-16 10:10 ` Ingo Molnar 2010-03-16 10:20 ` Avi Kivity 2010-03-16 10:25 ` Ingo Molnar 2010-03-16 10:24 ` Török Edwin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox