public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* fix callgraphs of 32-bit processes on 64-bit kernels.
@ 2010-03-15 15:34 Török Edwin
  2010-03-15 15:34 ` [PATCH] perf: x86: " Török Edwin
  2010-03-15 16:23 ` Török Edwin
  0 siblings, 2 replies; 22+ messages in thread
From: Török Edwin @ 2010-03-15 15:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras,
	x86, linux-kernel

Hi,

Callgraph profiling 32-bit apps on a 64-bit kernel doesn't work.
The reason is that perf_callchain_user tries to read a stackframe with 64-bit
pointers, which is wrong for a 32-bit process.

This patch fixes that, and I am almost able to get nice callgraph profiles
from 32-bit apps now! (except for some problems with perf itself when tracing
kernel modules, see [1])

Page-faults can be traced nicely (sid-ia32 is a 32-bit chroot):

$ sudo perf record -e page-faults -f -g /home/edwin/sid-ia32/usr/bin/glxgears
$ sudo perf report
...
    45.33%  libc-2.10.2.so                   [.] __GI_memcpy
            |
            --- __GI_memcpy
                _mesa_BufferDataARB
                _mesa_meta_Clear
                radeonUserClear
                r700Clear
                _mesa_Clear
                0x8049367
                0x804a6ba
                __libc_start_main
                0x8049111

    16.96%  libc-2.10.2.so                   [.] __GI_memset
            |
            --- __GI_memset
                _tnl_init_vertices
                _swsetup_CreateContext
                r600CreateContext
                driCreateNewContext
                dri2CreateNewContext
                0xf77ab7dd
                0xf7783c67
                0xf778514c
                0x804974f
                0x804a33d
                __libc_start_main
                0x8049111

And CPU cycles can be traced too in userspace:
$ sudo perf record -f -g /home/edwin/sid-ia32/usr/bin/glxgears
$ sudo perf report --sort comm,dso
    [...]
    44.97%  glxgears  r600_dri.so
            |
            |--5.85%-- r700SendSPIState
            |          radeonEmitState
            |          r700DrawPrims
            |          |
            |          |--95.45%-- vbo_save_playback_vertex_list
            |          |          execute_list
            |          |          _mesa_CallList
            |          |          neutral_CallList
            |          |          |
            |          |          |--38.10%-- 0x80494a8
            |          |          |          0x804a6ba
            |          |          |          __libc_start_main
            |          |          |          0x8049111
    [....]
    40.00%  glxgears  [kernel]
            |
            |--3.14%-- copy_user_generic_string
            |          |
            |          |--71.70%-- 0xffffffffa01b4493
            |          |          0xffffffffa01b7c0b
            |          |          0xffffffffa018b45b
            |          |          0xffffffffa00ca927
            |          |          0xffffffffa01c524e
            |          |          compat_sys_ioctl
            |          |          sysenter_dispatch
            |          |          0xf77ca430
            |          |          drmCommandWriteRead
            |          |          0xf74d7ab5
            |          |          0xf74d89a4
            |          |          rcommonFlushCmdBufLocked
            |          |          rcommonFlushCmdBuf
            |          |          radeonFlush
            |          |          _mesa_flush
            |          |          _mesa_Flush
            |          |          0xf775f270
            |          |          0x804a6d5
            |          |          __libc_start_main
            |          |          0x8049111
            |          |
            |          |--15.09%-- 0xffffffffa01c524e
            |          |          compat_sys_ioctl
            |          |          sysenter_dispatch
            |          |          0xf77ca430
            |          |          drmCommandWriteRead

[1] But there is a problem with the perf tool: it can't trace addresses in
kernel modules. This is a problem regardless if the traced app is 32-bit or
64-bit; and regardless if I do callgraph profiling or not.
See the above trace, where the kernel addresses have all ffffffffa0* without a
symbol name.

If I look at /proc/kallsyms I can guess the symbols, for example
0xffffffffa01b4493 is probably this one:
  ffffffffa01b4411 t r600_cs_packet_parse	[radeon]

If I record/report without callgraph its the same problem:
[...]
    24.01%  glxgears  [kernel]                           [k] 0xffffffffa01b4ee9
     3.96%  glxgears  libdrm_radeon.so.1.0.0             [.] cs_gem_write_reloc
     3.53%  glxgears  r600_dri.so                        [.] r700SendSPIState
     2.77%  glxgears  r600_dri.so                        [.] r700DrawPrims
     1.99%  glxgears  r600_dri.so                        [.] r700SendVSConsts

Kernel symbol for 0xffffffffa01b4ee9 is not shown, I can guess it is this one
(hey it was an exact match!):
  ffffffffa01b4ee9 t r600_packet3_check	[radeon]

It would be good if perf knew how to lookup symbols in kernel modules!

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels.
  2010-03-15 15:34 fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin
@ 2010-03-15 15:34 ` Török Edwin
  2010-03-16 14:49   ` Frederic Weisbecker
  2010-03-15 16:23 ` Török Edwin
  1 sibling, 1 reply; 22+ messages in thread
From: Török Edwin @ 2010-03-15 15:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras,
	x86, linux-kernel, Török Edwin

When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
stopped after the first function, because it has seen a garbage memory address
(tried to interpret the frame pointer, and return address as a 64-bit pointer).

Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.

Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
latter is only set when the 32-bit process is executing a syscall,
which may not always be the case (when tracing page fault events for example).

Signed-off-by: Török Edwin <edwintorok@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8c1c070..13ee83a 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
 	return bytes == sizeof(*frame);
 }
 
+struct stack_frame_ia32 {
+    u32 next_frame;
+    u32 return_address;
+};
+
+static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame)
+{
+	unsigned long bytes;
+
+	bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame));
+
+	return bytes == sizeof(*frame);
+}
+
 static void
 perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
 {
@@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
 
 	callchain_store(entry, PERF_CONTEXT_USER);
 	callchain_store(entry, regs->ip);
+	if (test_thread_flag(TIF_IA32)) {
+	    /* 32-bit process in 64-bit kernel. */
+	    u32 fp = regs->bp;
+	    struct stack_frame_ia32 frame;
+	    while (entry->nr < PERF_MAX_STACK_DEPTH) {
+		frame.next_frame     = 0;
+		frame.return_address = 0;
+
+		if (!copy_stack_frame_ia32(fp, &frame))
+			break;
+
+		if ((unsigned long)fp < regs->sp)
+			break;
+
+		callchain_store(entry, frame.return_address);
+		fp = frame.next_frame;
+	    }
+	    return;
+	}
 
 	while (entry->nr < PERF_MAX_STACK_DEPTH) {
 		frame.next_frame	     = NULL;
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: fix callgraphs of 32-bit processes on 64-bit kernels.
  2010-03-15 15:34 fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin
  2010-03-15 15:34 ` [PATCH] perf: x86: " Török Edwin
@ 2010-03-15 16:23 ` Török Edwin
  2010-03-16  8:18   ` Török Edwin
  1 sibling, 1 reply; 22+ messages in thread
From: Török Edwin @ 2010-03-15 16:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras,
	x86, linux-kernel

On 03/15/2010 05:34 PM, Török Edwin wrote:
> Hi,
> 
> Callgraph profiling 32-bit apps on a 64-bit kernel doesn't work.
> The reason is that perf_callchain_user tries to read a stackframe with 64-bit
> pointers, which is wrong for a 32-bit process.
> 
> This patch fixes that, and I am almost able to get nice callgraph profiles
> from 32-bit apps now! (except for some problems with perf itself when tracing
> kernel modules, see [1])
> 
> Page-faults can be traced nicely (sid-ia32 is a 32-bit chroot):
> 
> $ sudo perf record -e page-faults -f -g /home/edwin/sid-ia32/usr/bin/glxgears
> $ sudo perf report
> ...
>     45.33%  libc-2.10.2.so                   [.] __GI_memcpy
>             |
>             --- __GI_memcpy
>                 _mesa_BufferDataARB
>                 _mesa_meta_Clear
>                 radeonUserClear
>                 r700Clear
>                 _mesa_Clear
>                 0x8049367
>                 0x804a6ba
>                 __libc_start_main
>                 0x8049111
> 
>     16.96%  libc-2.10.2.so                   [.] __GI_memset
>             |
>             --- __GI_memset
>                 _tnl_init_vertices
>                 _swsetup_CreateContext
>                 r600CreateContext
>                 driCreateNewContext
>                 dri2CreateNewContext
>                 0xf77ab7dd
>                 0xf7783c67
>                 0xf778514c
>                 0x804974f
>                 0x804a33d
>                 __libc_start_main
>                 0x8049111
> 
> And CPU cycles can be traced too in userspace:
> $ sudo perf record -f -g /home/edwin/sid-ia32/usr/bin/glxgears
> $ sudo perf report --sort comm,dso
>     [...]
>     44.97%  glxgears  r600_dri.so
>             |
>             |--5.85%-- r700SendSPIState
>             |          radeonEmitState
>             |          r700DrawPrims
>             |          |
>             |          |--95.45%-- vbo_save_playback_vertex_list
>             |          |          execute_list
>             |          |          _mesa_CallList
>             |          |          neutral_CallList
>             |          |          |
>             |          |          |--38.10%-- 0x80494a8
>             |          |          |          0x804a6ba
>             |          |          |          __libc_start_main
>             |          |          |          0x8049111
>     [....]
>     40.00%  glxgears  [kernel]
>             |
>             |--3.14%-- copy_user_generic_string
>             |          |
>             |          |--71.70%-- 0xffffffffa01b4493
>             |          |          0xffffffffa01b7c0b
>             |          |          0xffffffffa018b45b
>             |          |          0xffffffffa00ca927
>             |          |          0xffffffffa01c524e
>             |          |          compat_sys_ioctl
>             |          |          sysenter_dispatch
>             |          |          0xf77ca430
>             |          |          drmCommandWriteRead
>             |          |          0xf74d7ab5
>             |          |          0xf74d89a4
>             |          |          rcommonFlushCmdBufLocked
>             |          |          rcommonFlushCmdBuf
>             |          |          radeonFlush
>             |          |          _mesa_flush
>             |          |          _mesa_Flush
>             |          |          0xf775f270
>             |          |          0x804a6d5
>             |          |          __libc_start_main
>             |          |          0x8049111
>             |          |
>             |          |--15.09%-- 0xffffffffa01c524e
>             |          |          compat_sys_ioctl
>             |          |          sysenter_dispatch
>             |          |          0xf77ca430
>             |          |          drmCommandWriteRead
> 
> [1] But there is a problem with the perf tool: it can't trace addresses in
> kernel modules. This is a problem regardless if the traced app is 32-bit or
> 64-bit; and regardless if I do callgraph profiling or not.
> See the above trace, where the kernel addresses have all ffffffffa0* without a
> symbol name.
> 
> If I look at /proc/kallsyms I can guess the symbols, for example
> 0xffffffffa01b4493 is probably this one:
>   ffffffffa01b4411 t r600_cs_packet_parse	[radeon]
> 
> If I record/report without callgraph its the same problem:
> [...]
>     24.01%  glxgears  [kernel]                           [k] 0xffffffffa01b4ee9
>      3.96%  glxgears  libdrm_radeon.so.1.0.0             [.] cs_gem_write_reloc
>      3.53%  glxgears  r600_dri.so                        [.] r700SendSPIState
>      2.77%  glxgears  r600_dri.so                        [.] r700DrawPrims
>      1.99%  glxgears  r600_dri.so                        [.] r700SendVSConsts
> 
> Kernel symbol for 0xffffffffa01b4ee9 is not shown, I can guess it is this one
> (hey it was an exact match!):
>   ffffffffa01b4ee9 t r600_packet3_check	[radeon]
> 
> It would be good if perf knew how to lookup symbols in kernel modules!

BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show
the symbols either.

> 
> Best regards,
> --Edwin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fix callgraphs of 32-bit processes on 64-bit kernels.
  2010-03-15 16:23 ` Török Edwin
@ 2010-03-16  8:18   ` Török Edwin
  2010-03-16  8:47     ` Ingo Molnar
  2010-03-16  9:57     ` [PATCH] perf: install into /usr/local by default Török Edwin
  0 siblings, 2 replies; 22+ messages in thread
From: Török Edwin @ 2010-03-16  8:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Paul Mackerras,
	x86, linux-kernel

On 03/15/2010 06:23 PM, Török Edwin wrote:
> On 03/15/2010 05:34 PM, Török Edwin wrote:
>>
>> It would be good if perf knew how to lookup symbols in kernel modules!
> 
> BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show
> the symbols either.

I always forget that, unlike every other program, perf doesn't install
by default to /usr/local!
So I was running the wrong version of perf (from an older kernel), since
perf was installed to $HOME/bin (which of course isn't in sudo's path).

Sorry for the confusion, the 2.6.33 perf DOES know how to lookup the
symbols:
    9.92%  glxgears  [radeon]                           [k]
r600_packet3_check
            |
            --- r600_packet3_check
               |
               |--96.80%-- r600_cs_parse
               |          radeon_cs_ioctl
               |          drm_ioctl
               |          radeon_kms_compat_ioctl
               |          compat_sys_ioctl
               |          sysenter_dispatch
               |          0xf7759430
               |          drmCommandWriteRead
               |          cs_gem_emit
               |          radeon_cs_emit
               |          rcommonFlushCmdBufLocked
               |          rcommonFlushCmdBuf
               |          radeonFlush
               |          _mesa_flush
               |          _mesa_Flush
               |          0xf76ee270
               |          0x804a6d5
               |          __libc_start_main
               |          0x8049111
[...]

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fix callgraphs of 32-bit processes on 64-bit kernels.
  2010-03-16  8:18   ` Török Edwin
@ 2010-03-16  8:47     ` Ingo Molnar
  2010-03-16 10:17       ` Török Edwin
  2010-03-16  9:57     ` [PATCH] perf: install into /usr/local by default Török Edwin
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2010-03-16  8:47 UTC (permalink / raw)
  To: T??r??k Edwin
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Paul Mackerras, x86, linux-kernel, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo


* T??r??k Edwin <edwintorok@gmail.com> wrote:

> On 03/15/2010 06:23 PM, T??r??k Edwin wrote:
> > On 03/15/2010 05:34 PM, T??r??k Edwin wrote:
> >>
> >> It would be good if perf knew how to lookup symbols in kernel modules!
> > 
> > BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show
> > the symbols either.
> 
> I always forget that, unlike every other program, perf doesn't install
> by default to /usr/local!
> So I was running the wrong version of perf (from an older kernel), since
> perf was installed to $HOME/bin (which of course isn't in sudo's path).
> 
> Sorry for the confusion, the 2.6.33 perf DOES know how to lookup the
> symbols:
>     9.92%  glxgears  [radeon]                           [k]
> r600_packet3_check
>             |
>             --- r600_packet3_check
>                |
>                |--96.80%-- r600_cs_parse

Ok, great!

I suspect we could install into /usr/local too. Do you want to send a patch 
for that?

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] perf: install into /usr/local by default.
  2010-03-16  8:18   ` Török Edwin
  2010-03-16  8:47     ` Ingo Molnar
@ 2010-03-16  9:57     ` Török Edwin
  2010-03-16 10:10       ` Ingo Molnar
  1 sibling, 1 reply; 22+ messages in thread
From: Török Edwin @ 2010-03-16  9:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Paul Mackerras, linux-kernel,
	Török Edwin

It was confusing to install into $(HOME)/bin, especially since there was
no documentation mentioning where perf gets installed by default.
So install to /usr/local by default, as other programs do, and allow users to
override the install location by specifying the prefix explicitly.

Signed-off-by: Török Edwin <edwintorok@gmail.com>
---
 tools/perf/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 2e7fa3a..8e8c199 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -216,7 +216,7 @@ STRIP ?= strip
 # runtime figures out where they are based on the path to the executable.
 # This can help installing the suite in a relocatable way.
 
-prefix = $(HOME)
+prefix = /usr/local
 bindir_relative = bin
 bindir = $(prefix)/$(bindir_relative)
 mandir = share/man
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: install into /usr/local by default.
  2010-03-16  9:57     ` [PATCH] perf: install into /usr/local by default Török Edwin
@ 2010-03-16 10:10       ` Ingo Molnar
  2010-03-16 10:20         ` Avi Kivity
  2010-03-16 10:24         ` Török Edwin
  0 siblings, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2010-03-16 10:10 UTC (permalink / raw)
  To: T??r??k Edwin
  Cc: Peter Zijlstra, Paul Mackerras, linux-kernel,
	=?unknown-8bit?B?RnLDqWTDqXJpYw==?= Weisbecker,
	Arnaldo Carvalho de Melo


* T??r??k Edwin <edwintorok@gmail.com> wrote:

> It was confusing to install into $(HOME)/bin, especially since there was
> no documentation mentioning where perf gets installed by default.
> So install to /usr/local by default, as other programs do, and allow users to
> override the install location by specifying the prefix explicitly.
> 
> Signed-off-by: T??r??k Edwin <edwintorok@gmail.com>
> ---
>  tools/perf/Makefile |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> index 2e7fa3a..8e8c199 100644
> --- a/tools/perf/Makefile
> +++ b/tools/perf/Makefile
> @@ -216,7 +216,7 @@ STRIP ?= strip
>  # runtime figures out where they are based on the path to the executable.
>  # This can help installing the suite in a relocatable way.
>  
> -prefix = $(HOME)
> +prefix = /usr/local
>  bindir_relative = bin
>  bindir = $(prefix)/$(bindir_relative)
>  mandir = share/man

Btw., we inherited that default prefix from the Git project.

Is there a way to get it into ~/bin/ if the user does not have permission to 
/usr/local ? (i.e. doesnt run it as root)

That's a really convenient aspect of doing a 'make install' as user. (Which i 
tend to do in most cases)

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fix callgraphs of 32-bit processes on 64-bit kernels.
  2010-03-16  8:47     ` Ingo Molnar
@ 2010-03-16 10:17       ` Török Edwin
  2010-03-16 10:23         ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Török Edwin @ 2010-03-16 10:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Paul Mackerras, x86, linux-kernel, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo

On 03/16/2010 10:47 AM, Ingo Molnar wrote:
> * T??r??k Edwin <edwintorok@gmail.com> wrote:
> 
>> On 03/15/2010 06:23 PM, T??r??k Edwin wrote:
>>> On 03/15/2010 05:34 PM, T??r??k Edwin wrote:
>>>> It would be good if perf knew how to lookup symbols in kernel modules!
>>> BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show
>>> the symbols either.
>> I always forget that, unlike every other program, perf doesn't install
>> by default to /usr/local!
>> So I was running the wrong version of perf (from an older kernel), since
>> perf was installed to $HOME/bin (which of course isn't in sudo's path).
>>
>> Sorry for the confusion, the 2.6.33 perf DOES know how to lookup the
>> symbols:
>>     9.92%  glxgears  [radeon]                           [k]
>> r600_packet3_check
>>             |
>>             --- r600_packet3_check
>>                |
>>                |--96.80%-- r600_cs_parse
> 
> Ok, great!

BTW the patch I sent yesterday for tracing 32-bit apps is still needed,
since that is a kernel patch, and it wasn't due to using the wrong perf.

> 
> I suspect we could install into /usr/local too. Do you want to send a patch 
> for that?

Sent.

BTW I think perf would need some documentation on how to install, and
what packages you need to build everything, what permissions it needs to
run, etc.

1. manpages
For example by default the manpages don't get built and installed, so
perf report --help doesn't work. It needs a 'make man', and 'make
install-man'.
This is fine, because they need asciidoc and xmlto which aren't usually
installed on every system. But there should be some documentation
mentioning this.

2. privileges
I just found out that perf works without root privileges (I just
assumed	 it needed root, since oprofile needs it).

3. non-working targets?
Also there are some targets in Documentation that can't be built due to
missing files, like pdf which needs a non-existent user-manual.xml.

4. unresolved symbols
Sometimes I get symbol addresses that are not resolved, like this:
    57.03%   :32216       7fc7dc0acfa6  [.] 0x007fc7dc0acfa6
    12.39%   :32216  [radeon]           [k] r600_packet3_check
     4.92%   :32216  [radeon]           [k] r600_cs_packet_parse
     2.70%   :32216  [radeon]           [k] r600_cs_parse

Is this due to ASLR? Does perf need ASLR disabled?
That address corresponds to this:
7fc7dc07e000-7fc7dc2ef000 r-xp 00000000 fd:03 27713
 /opt/xorg/lib/dri/r600_dri.so

It'd of course be nice if there was a distro package for perf, I think
I'll file a RFP in Debian for that.

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: install into /usr/local by default.
  2010-03-16 10:10       ` Ingo Molnar
@ 2010-03-16 10:20         ` Avi Kivity
  2010-03-16 10:25           ` Ingo Molnar
  2010-03-16 10:24         ` Török Edwin
  1 sibling, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2010-03-16 10:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: T??r??k Edwin, Peter Zijlstra, Paul Mackerras, linux-kernel,
	Frédéric Weisbecker, Arnaldo Carvalho de Melo

On 03/16/2010 12:10 PM, Ingo Molnar wrote:
> * T??r??k Edwin<edwintorok@gmail.com>  wrote:
>
>    
>> It was confusing to install into $(HOME)/bin, especially since there was
>> no documentation mentioning where perf gets installed by default.
>> So install to /usr/local by default, as other programs do, and allow users to
>> override the install location by specifying the prefix explicitly.
>>
>> Signed-off-by: T??r??k Edwin<edwintorok@gmail.com>
>> ---
>>   tools/perf/Makefile |    2 +-
>>   1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
>> index 2e7fa3a..8e8c199 100644
>> --- a/tools/perf/Makefile
>> +++ b/tools/perf/Makefile
>> @@ -216,7 +216,7 @@ STRIP ?= strip
>>   # runtime figures out where they are based on the path to the executable.
>>   # This can help installing the suite in a relocatable way.
>>
>> -prefix = $(HOME)
>> +prefix = /usr/local
>>   bindir_relative = bin
>>   bindir = $(prefix)/$(bindir_relative)
>>   mandir = share/man
>>      
> Btw., we inherited that default prefix from the Git project.
>
> Is there a way to get it into ~/bin/ if the user does not have permission to
> /usr/local ? (i.e. doesnt run it as root)
>
> That's a really convenient aspect of doing a 'make install' as user. (Which i
> tend to do in most cases)
>    

What about people (like me) who do 'make && sudo make install'?

Can we make it position independent and derive the path from /proc/$$/exe?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fix callgraphs of 32-bit processes on 64-bit kernels.
  2010-03-16 10:17       ` Török Edwin
@ 2010-03-16 10:23         ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2010-03-16 10:23 UTC (permalink / raw)
  To: T?r?k Edwin
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Paul Mackerras, x86, linux-kernel, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo


* T?r?k Edwin <edwintorok@gmail.com> wrote:

> On 03/16/2010 10:47 AM, Ingo Molnar wrote:
> > * T??r??k Edwin <edwintorok@gmail.com> wrote:
> > 
> >> On 03/15/2010 06:23 PM, T??r??k Edwin wrote:
> >>> On 03/15/2010 05:34 PM, T??r??k Edwin wrote:
> >>>> It would be good if perf knew how to lookup symbols in kernel modules!
> >>> BTW perf report -m -k /home/edwin/builds/linux-2.6/vmlinux doesn't show
> >>> the symbols either.
> >> I always forget that, unlike every other program, perf doesn't install
> >> by default to /usr/local!
> >> So I was running the wrong version of perf (from an older kernel), since
> >> perf was installed to $HOME/bin (which of course isn't in sudo's path).
> >>
> >> Sorry for the confusion, the 2.6.33 perf DOES know how to lookup the
> >> symbols:
> >>     9.92%  glxgears  [radeon]                           [k]
> >> r600_packet3_check
> >>             |
> >>             --- r600_packet3_check
> >>                |
> >>                |--96.80%-- r600_cs_parse
> > 
> > Ok, great!
> 
> BTW the patch I sent yesterday for tracing 32-bit apps is still needed, 
> since that is a kernel patch, and it wasn't due to using the wrong perf.

I've Cc:-ed Frederic for that bug. (Frederic has written a good deal of that 
code)

> > I suspect we could install into /usr/local too. Do you want to send a patch 
> > for that?
> 
> Sent.
> 
> BTW I think perf would need some documentation on how to install, and what 
> packages you need to build everything, what permissions it needs to run, 
> etc.

Agreed. (I've Cc:-ed Arnaldo who has a pending fix in this area.)

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: install into /usr/local by default.
  2010-03-16 10:10       ` Ingo Molnar
  2010-03-16 10:20         ` Avi Kivity
@ 2010-03-16 10:24         ` Török Edwin
  1 sibling, 0 replies; 22+ messages in thread
From: Török Edwin @ 2010-03-16 10:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Paul Mackerras, linux-kernel,
	Frédéric Weisbecker, Arnaldo Carvalho de Melo,
	Avi Kivity

On 03/16/2010 12:10 PM, Ingo Molnar wrote:
> * T??r??k Edwin <edwintorok@gmail.com> wrote:
> 
>> It was confusing to install into $(HOME)/bin, especially since there was
>> no documentation mentioning where perf gets installed by default.
>> So install to /usr/local by default, as other programs do, and allow users to
>> override the install location by specifying the prefix explicitly.
>>
>> Signed-off-by: T??r??k Edwin <edwintorok@gmail.com>
>> ---
>>  tools/perf/Makefile |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
>> index 2e7fa3a..8e8c199 100644
>> --- a/tools/perf/Makefile
>> +++ b/tools/perf/Makefile
>> @@ -216,7 +216,7 @@ STRIP ?= strip
>>  # runtime figures out where they are based on the path to the executable.
>>  # This can help installing the suite in a relocatable way.
>>  
>> -prefix = $(HOME)
>> +prefix = /usr/local
>>  bindir_relative = bin
>>  bindir = $(prefix)/$(bindir_relative)
>>  mandir = share/man
> 
> Btw., we inherited that default prefix from the Git project.
> 
> Is there a way to get it into ~/bin/ if the user does not have permission to 
> /usr/local ? (i.e. doesnt run it as root)

That is complicated, I usually run make as a normal user, and only do
make install as root (or sudo make install).
I do that for the kernel itself, and usually every program I build (I
don't like compiling as root).

> 
> That's a really convenient aspect of doing a 'make install' as user. (Which i 
> tend to do in most cases)

On 03/16/2010 12:20 PM, Avi Kivity wrote:
> What about people (like me) who do 'make && sudo make install'?
>
> Can we make it position independent and derive the path from /proc/$$/exe?
>

There is a RUNTIME_PREFIX define (undocumented...) that seems to do
something like that.
Are there any security implications of using that by default?

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: install into /usr/local by default.
  2010-03-16 10:20         ` Avi Kivity
@ 2010-03-16 10:25           ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2010-03-16 10:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: T??r??k Edwin, Peter Zijlstra, Paul Mackerras, linux-kernel,
	Fr?d?ric Weisbecker, Arnaldo Carvalho de Melo


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 12:10 PM, Ingo Molnar wrote:
> >* T??r??k Edwin<edwintorok@gmail.com>  wrote:
> >
> >>It was confusing to install into $(HOME)/bin, especially since there was
> >>no documentation mentioning where perf gets installed by default.
> >>So install to /usr/local by default, as other programs do, and allow users to
> >>override the install location by specifying the prefix explicitly.
> >>
> >>Signed-off-by: T??r??k Edwin<edwintorok@gmail.com>
> >>---
> >>  tools/perf/Makefile |    2 +-
> >>  1 files changed, 1 insertions(+), 1 deletions(-)
> >>
> >>diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> >>index 2e7fa3a..8e8c199 100644
> >>--- a/tools/perf/Makefile
> >>+++ b/tools/perf/Makefile
> >>@@ -216,7 +216,7 @@ STRIP ?= strip
> >>  # runtime figures out where they are based on the path to the executable.
> >>  # This can help installing the suite in a relocatable way.
> >>
> >>-prefix = $(HOME)
> >>+prefix = /usr/local
> >>  bindir_relative = bin
> >>  bindir = $(prefix)/$(bindir_relative)
> >>  mandir = share/man
> >Btw., we inherited that default prefix from the Git project.
> >
> >Is there a way to get it into ~/bin/ if the user does not have permission to
> >/usr/local ? (i.e. doesnt run it as root)
> >
> >That's a really convenient aspect of doing a 'make install' as user. (Which i
> >tend to do in most cases)
> 
> What about people (like me) who do 'make && sudo make install'?

I'd like everyone to be happy :-)

In case of irreconcilable differences i prefer the creation of two parallel 
universes, one for each variant. This does not seem to be such a case though:

> Can we make it position independent and derive the path from /proc/$$/exe?

Sounds like a useful approach ...

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels.
  2010-03-15 15:34 ` [PATCH] perf: x86: " Török Edwin
@ 2010-03-16 14:49   ` Frederic Weisbecker
  2010-03-16 15:02     ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Török Edwin
  2010-03-16 15:04     ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin
  0 siblings, 2 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2010-03-16 14:49 UTC (permalink / raw)
  To: Török Edwin
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Paul Mackerras, x86, linux-kernel

On Mon, Mar 15, 2010 at 05:34:20PM +0200, Török Edwin wrote:
> When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
> stopped after the first function, because it has seen a garbage memory address
> (tried to interpret the frame pointer, and return address as a 64-bit pointer).
> 
> Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.
> 
> Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
> latter is only set when the 32-bit process is executing a syscall,
> which may not always be the case (when tracing page fault events for example).
> 
> Signed-off-by: Török Edwin <edwintorok@gmail.com>
> ---
>  arch/x86/kernel/cpu/perf_event.c |   33 +++++++++++++++++++++++++++++++++
>  1 files changed, 33 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 8c1c070..13ee83a 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
>  	return bytes == sizeof(*frame);
>  }
>  
> +struct stack_frame_ia32 {
> +    u32 next_frame;
> +    u32 return_address;
> +};
> +
> +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame)
> +{
> +	unsigned long bytes;
> +
> +	bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame));
> +
> +	return bytes == sizeof(*frame);
> +}
> +
>  static void
>  perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
>  {
> @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
>  
>  	callchain_store(entry, PERF_CONTEXT_USER);
>  	callchain_store(entry, regs->ip);
> +	if (test_thread_flag(TIF_IA32)) {
> +	    /* 32-bit process in 64-bit kernel. */
> +	    u32 fp = regs->bp;
> +	    struct stack_frame_ia32 frame;
> +	    while (entry->nr < PERF_MAX_STACK_DEPTH) {
> +		frame.next_frame     = 0;
> +		frame.return_address = 0;
> +
> +		if (!copy_stack_frame_ia32(fp, &frame))
> +			break;
> +
> +		if ((unsigned long)fp < regs->sp)
> +			break;



Shouldn't it be this?

	if (fp < (u32)regs->sp)

As the high part of fp is zeroed in the cast but
the high part of regs->sp remains. I don't know what could be
there, but since the user doesn't use it, perhaps just garbage?
And that could messup the comparison.

Otherwise the rest looks pretty good to me, nice catch.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2.
  2010-03-16 14:49   ` Frederic Weisbecker
@ 2010-03-16 15:02     ` Török Edwin
  2010-03-16 17:05       ` Ingo Molnar
  2010-03-16 15:04     ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin
  1 sibling, 1 reply; 22+ messages in thread
From: Török Edwin @ 2010-03-16 15:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Paul Mackerras, x86, linux-kernel, Török Edwin

When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
stopped after the first function, because it has seen a garbage memory address
(tried to interpret the frame pointer, and return address as a 64-bit pointer).

Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.

Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
latter is only set when the 32-bit process is executing a syscall,
which may not always be the case (when tracing page fault events for example).

Signed-off-by: Török Edwin <edwintorok@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8c1c070..b85ea9f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
 	return bytes == sizeof(*frame);
 }
 
+struct stack_frame_ia32 {
+    u32 next_frame;
+    u32 return_address;
+};
+
+static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame)
+{
+	unsigned long bytes;
+
+	bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame));
+
+	return bytes == sizeof(*frame);
+}
+
 static void
 perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
 {
@@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
 
 	callchain_store(entry, PERF_CONTEXT_USER);
 	callchain_store(entry, regs->ip);
+	if (test_thread_flag(TIF_IA32)) {
+	    /* 32-bit process in 64-bit kernel. */
+	    u32 fp = regs->bp;
+	    struct stack_frame_ia32 frame;
+	    while (entry->nr < PERF_MAX_STACK_DEPTH) {
+		frame.next_frame     = 0;
+		frame.return_address = 0;
+
+		if (!copy_stack_frame_ia32(fp, &frame))
+			break;
+
+		if (fp < (u32)regs->sp)
+			break;
+
+		callchain_store(entry, frame.return_address);
+		fp = frame.next_frame;
+	    }
+	    return;
+	}
 
 	while (entry->nr < PERF_MAX_STACK_DEPTH) {
 		frame.next_frame	     = NULL;
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels.
  2010-03-16 14:49   ` Frederic Weisbecker
  2010-03-16 15:02     ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Török Edwin
@ 2010-03-16 15:04     ` Török Edwin
  1 sibling, 0 replies; 22+ messages in thread
From: Török Edwin @ 2010-03-16 15:04 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Paul Mackerras, x86, linux-kernel

On 03/16/2010 04:49 PM, Frederic Weisbecker wrote:
> On Mon, Mar 15, 2010 at 05:34:20PM +0200, Török Edwin wrote:
>> When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
>> stopped after the first function, because it has seen a garbage memory address
>> (tried to interpret the frame pointer, and return address as a 64-bit pointer).
>>
>> Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.
>>
>> Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
>> latter is only set when the 32-bit process is executing a syscall,
>> which may not always be the case (when tracing page fault events for example).
>>
>> Signed-off-by: Török Edwin <edwintorok@gmail.com>
>> ---
>>  arch/x86/kernel/cpu/perf_event.c |   33 +++++++++++++++++++++++++++++++++
>>  1 files changed, 33 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
>> index 8c1c070..13ee83a 100644
>> --- a/arch/x86/kernel/cpu/perf_event.c
>> +++ b/arch/x86/kernel/cpu/perf_event.c
>> @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
>>  	return bytes == sizeof(*frame);
>>  }
>>  
>> +struct stack_frame_ia32 {
>> +    u32 next_frame;
>> +    u32 return_address;
>> +};
>> +
>> +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame)
>> +{
>> +	unsigned long bytes;
>> +
>> +	bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame));
>> +
>> +	return bytes == sizeof(*frame);
>> +}
>> +
>>  static void
>>  perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
>>  {
>> @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
>>  
>>  	callchain_store(entry, PERF_CONTEXT_USER);
>>  	callchain_store(entry, regs->ip);
>> +	if (test_thread_flag(TIF_IA32)) {
>> +	    /* 32-bit process in 64-bit kernel. */
>> +	    u32 fp = regs->bp;
>> +	    struct stack_frame_ia32 frame;
>> +	    while (entry->nr < PERF_MAX_STACK_DEPTH) {
>> +		frame.next_frame     = 0;
>> +		frame.return_address = 0;
>> +
>> +		if (!copy_stack_frame_ia32(fp, &frame))
>> +			break;
>> +
>> +		if ((unsigned long)fp < regs->sp)
>> +			break;
> 
> 
> 
> Shouldn't it be this?
> 
> 	if (fp < (u32)regs->sp)
> 
> As the high part of fp is zeroed in the cast but
> the high part of regs->sp remains. I don't know what could be
> there, but since the user doesn't use it, perhaps just garbage?
> And that could messup the comparison.

Agreed. I sent a new patch with that change.

> 
> Otherwise the rest looks pretty good to me, nice catch.
> 

Thanks.

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2.
  2010-03-16 15:02     ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Török Edwin
@ 2010-03-16 17:05       ` Ingo Molnar
  2010-03-17  8:48         ` Török Edwin
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2010-03-16 17:05 UTC (permalink / raw)
  To: T??r??k Edwin, H. Peter Anvin
  Cc: Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra, Paul Mackerras, x86, linux-kernel


* T??r??k Edwin <edwintorok@gmail.com> wrote:

> When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
> stopped after the first function, because it has seen a garbage memory address
> (tried to interpret the frame pointer, and return address as a 64-bit pointer).
> 
> Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.
> 
> Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
> latter is only set when the 32-bit process is executing a syscall,
> which may not always be the case (when tracing page fault events for example).
> 
> Signed-off-by: T??r??k Edwin <edwintorok@gmail.com>
> ---
>  arch/x86/kernel/cpu/perf_event.c |   33 +++++++++++++++++++++++++++++++++
>  1 files changed, 33 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 8c1c070..b85ea9f 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
>  	return bytes == sizeof(*frame);
>  }
>  
> +struct stack_frame_ia32 {
> +    u32 next_frame;
> +    u32 return_address;
> +};

Please put such new data type definitions not into the middle of a .c file but 
next to where struct stack_frame is defined.

> +
> +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame)
> +{
> +	unsigned long bytes;
> +
> +	bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame));
> +
> +	return bytes == sizeof(*frame);
> +}
>

Single-use - should be inline i guess.

> +
>  static void
>  perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
>  {
> @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
>  
>  	callchain_store(entry, PERF_CONTEXT_USER);
>  	callchain_store(entry, regs->ip);
> +	if (test_thread_flag(TIF_IA32)) {
> +	    /* 32-bit process in 64-bit kernel. */
> +	    u32 fp = regs->bp;
> +	    struct stack_frame_ia32 frame;
> +	    while (entry->nr < PERF_MAX_STACK_DEPTH) {

Please put newlines after local variable definition so that they are clearly 
delimited.

Also, the tabulation is weird - please run it through scripts/checkpatch.pl.

> +		frame.next_frame     = 0;
> +		frame.return_address = 0;
> +
> +		if (!copy_stack_frame_ia32(fp, &frame))
> +			break;
> +
> +		if (fp < (u32)regs->sp)
> +			break;
> +
> +		callchain_store(entry, frame.return_address);
> +		fp = frame.next_frame;
> +	    }
> +	    return;

This whole new block should probably be in a helper inline?

Also, it should probably be #ifdef CONFIG_COMPAT or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2.
  2010-03-16 17:05       ` Ingo Molnar
@ 2010-03-17  8:48         ` Török Edwin
  2010-03-17  8:49           ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin
  2010-03-17  9:59           ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Ingo Molnar
  0 siblings, 2 replies; 22+ messages in thread
From: Török Edwin @ 2010-03-17  8:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Paul Mackerras, x86, linux-kernel

On 03/16/2010 07:05 PM, Ingo Molnar wrote:
> * T??r??k Edwin <edwintorok@gmail.com> wrote:
> 
>> When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
>> stopped after the first function, because it has seen a garbage memory address
>> (tried to interpret the frame pointer, and return address as a 64-bit pointer).
>>
>> Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.
>>
>> Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
>> latter is only set when the 32-bit process is executing a syscall,
>> which may not always be the case (when tracing page fault events for example).
>>
>> Signed-off-by: T??r??k Edwin <edwintorok@gmail.com>
>> ---
>>  arch/x86/kernel/cpu/perf_event.c |   33 +++++++++++++++++++++++++++++++++
>>  1 files changed, 33 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
>> index 8c1c070..b85ea9f 100644
>> --- a/arch/x86/kernel/cpu/perf_event.c
>> +++ b/arch/x86/kernel/cpu/perf_event.c
>> @@ -2401,6 +2401,20 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
>>  	return bytes == sizeof(*frame);
>>  }
>>  
>> +struct stack_frame_ia32 {
>> +    u32 next_frame;
>> +    u32 return_address;
>> +};
> 
> Please put such new data type definitions not into the middle of a .c file but 
> next to where struct stack_frame is defined.

Ok.

> 
>> +
>> +static int copy_stack_frame_ia32(u32 fp, struct stack_frame_ia32 *frame)
>> +{
>> +	unsigned long bytes;
>> +
>> +	bytes = copy_from_user_nmi(frame, (const void __user*)(unsigned long)fp, sizeof(*frame));
>> +
>> +	return bytes == sizeof(*frame);
>> +}
>>
> 
> Single-use - should be inline i guess.
> 

So should be copy_stack_frame() then.

>> +
>>  static void
>>  perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
>>  {
>> @@ -2414,6 +2428,25 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
>>  
>>  	callchain_store(entry, PERF_CONTEXT_USER);
>>  	callchain_store(entry, regs->ip);
>> +	if (test_thread_flag(TIF_IA32)) {
>> +	    /* 32-bit process in 64-bit kernel. */
>> +	    u32 fp = regs->bp;
>> +	    struct stack_frame_ia32 frame;
>> +	    while (entry->nr < PERF_MAX_STACK_DEPTH) {
> 
> Please put newlines after local variable definition so that they are clearly 
> delimited.
> 
> Also, the tabulation is weird - please run it through scripts/checkpatch.pl.

I forgot to change shiftwidth to 8 (I usually use 4).
I cleaned the checkpatch warnings now.

> 
>> +		frame.next_frame     = 0;
>> +		frame.return_address = 0;
>> +
>> +		if (!copy_stack_frame_ia32(fp, &frame))
>> +			break;
>> +
>> +		if (fp < (u32)regs->sp)
>> +			break;
>> +
>> +		callchain_store(entry, frame.return_address);
>> +		fp = frame.next_frame;
>> +	    }
>> +	    return;
> 
> This whole new block should probably be in a helper inline?

To reduce indenting, or why?

> 
> Also, it should probably be #ifdef CONFIG_COMPAT or so.

Ok, see V3 of my patch.

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3.
  2010-03-17  8:48         ` Török Edwin
@ 2010-03-17  8:49           ` Török Edwin
  2010-03-17  9:54             ` Ingo Molnar
  2010-03-17 10:07             ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4 Török Edwin
  2010-03-17  9:59           ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Ingo Molnar
  1 sibling, 2 replies; 22+ messages in thread
From: Török Edwin @ 2010-03-17  8:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Török Edwin, H. Peter Anvin, Frederic Weisbecker,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Paul Mackerras, x86,
	linux-kernel

When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
stopped after the first function, because it has seen a garbage memory address
(tried to interpret the frame pointer, and return address as a 64-bit pointer).

Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.

Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
latter is only set when the 32-bit process is executing a syscall,
which may not always be the case (when tracing page fault events for example).

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Török Edwin <edwintorok@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c |   38 +++++++++++++++++++++++++++++++++-----
 arch/x86/kernel/dumpstack.h      |    5 +++++
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8c1c070..51f72f7 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -26,6 +26,7 @@
 #include <asm/apic.h>
 #include <asm/stacktrace.h>
 #include <asm/nmi.h>
+#include <asm/compat.h>
 
 static u64 perf_event_mask __read_mostly;
 
@@ -2392,14 +2393,32 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	return len;
 }
 
-static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
+#ifdef CONFIG_COMPAT
+
+static inline void
+perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
 {
-	unsigned long bytes;
+	/* 32-bit process in 64-bit kernel. */
+	struct stack_frame_ia32 frame;
+	const void __user *fp = compat_ptr(regs->bp);
+
+	while (entry->nr < PERF_MAX_STACK_DEPTH) {
+		unsigned long bytes;
+		frame.next_frame     = 0;
+		frame.return_address = 0;
+
+		bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+		if (bytes != sizeof(frame))
+			break;
 
-	bytes = copy_from_user_nmi(frame, fp, sizeof(*frame));
+		if (fp < compat_ptr(regs->sp))
+			break;
 
-	return bytes == sizeof(*frame);
+		callchain_store(entry, frame.return_address);
+		fp = compat_ptr(frame.next_frame);
+	}
 }
+#endif
 
 static void
 perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
@@ -2415,11 +2434,20 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
 	callchain_store(entry, PERF_CONTEXT_USER);
 	callchain_store(entry, regs->ip);
 
+#ifdef CONFIG_COMPAT
+	if (test_thread_flag(TIF_IA32)) {
+		perf_callchain_user32(regs, entry);
+		return;
+	}
+#endif
+
 	while (entry->nr < PERF_MAX_STACK_DEPTH) {
+		unsigned long bytes;
 		frame.next_frame	     = NULL;
 		frame.return_address = 0;
 
-		if (!copy_stack_frame(fp, &frame))
+		bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+		if (bytes != sizeof(frame))
 			break;
 
 		if ((unsigned long)fp < regs->sp)
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 4fd1420..3f0a242 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -29,4 +29,9 @@ struct stack_frame {
 	struct stack_frame *next_frame;
 	unsigned long return_address;
 };
+
+struct stack_frame_ia32 {
+    u32 next_frame;
+    u32 return_address;
+};
 #endif
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3.
  2010-03-17  8:49           ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin
@ 2010-03-17  9:54             ` Ingo Molnar
  2010-03-17 10:07             ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4 Török Edwin
  1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2010-03-17  9:54 UTC (permalink / raw)
  To: T??r??k Edwin
  Cc: H. Peter Anvin, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Paul Mackerras, x86, linux-kernel


* T??r??k Edwin <edwintorok@gmail.com> wrote:

> +#ifdef CONFIG_COMPAT
> +	if (test_thread_flag(TIF_IA32)) {
> +		perf_callchain_user32(regs, entry);
> +		return;
> +	}
> +#endif

You can push that test into perf_callchain_user32() and turn it into something 
like:

	if (perf_callchain_user32(regs, entry))
		return;

without the #ifdef here.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2.
  2010-03-17  8:48         ` Török Edwin
  2010-03-17  8:49           ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin
@ 2010-03-17  9:59           ` Ingo Molnar
  1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2010-03-17  9:59 UTC (permalink / raw)
  To: T?r?k Edwin
  Cc: H. Peter Anvin, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Paul Mackerras, x86, linux-kernel


* T?r?k Edwin <edwintorok@gmail.com> wrote:

> >> +	    }
> >> +	    return;
> > 
> > This whole new block should probably be in a helper inline?
> 
> To reduce indenting, or why?

Mainly to increase cleanliness. We want code that isolates separate blocks of 
logic into separate sections. So a simple construct of:

        if (perf_callchain_user32(regs, entry))
                return;

will tell the reader of the code that 'ok, this is 32-bit compat stuff'. It 
doesnt, at this level, intrude into the main logic of that function. Then if 
we look at perf_callchain_user32() we see all the 32-bit compat stack frame 
walking details.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4.
  2010-03-17  8:49           ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin
  2010-03-17  9:54             ` Ingo Molnar
@ 2010-03-17 10:07             ` Török Edwin
  2010-03-30 23:18               ` Frederic Weisbecker
  1 sibling, 1 reply; 22+ messages in thread
From: Török Edwin @ 2010-03-17 10:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Török Edwin, H. Peter Anvin, Frederic Weisbecker,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Paul Mackerras, x86,
	linux-kernel

When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
stopped after the first function, because it has seen a garbage memory address
(tried to interpret the frame pointer, and return address as a 64-bit pointer).

Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.

Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
latter is only set when the 32-bit process is executing a syscall,
which may not always be the case (when tracing page fault events for example).

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Török Edwin <edwintorok@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c |   44 +++++++++++++++++++++++++++++++++----
 arch/x86/kernel/dumpstack.h      |    5 ++++
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8c1c070..5f97f29 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -26,6 +26,7 @@
 #include <asm/apic.h>
 #include <asm/stacktrace.h>
 #include <asm/nmi.h>
+#include <asm/compat.h>
 
 static u64 perf_event_mask __read_mostly;
 
@@ -2392,14 +2393,42 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	return len;
 }
 
-static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
+#ifdef CONFIG_COMPAT
+static inline int
+perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
 {
-	unsigned long bytes;
+	/* 32-bit process in 64-bit kernel. */
+	struct stack_frame_ia32 frame;
+	const void __user *fp;
+
+	if (!test_thread_flag(TIF_IA32))
+		return 0;
+
+	fp = compat_ptr(regs->bp);
+	while (entry->nr < PERF_MAX_STACK_DEPTH) {
+		unsigned long bytes;
+		frame.next_frame     = 0;
+		frame.return_address = 0;
 
-	bytes = copy_from_user_nmi(frame, fp, sizeof(*frame));
+		bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+		if (bytes != sizeof(frame))
+			break;
+
+		if (fp < compat_ptr(regs->sp))
+			break;
 
-	return bytes == sizeof(*frame);
+		callchain_store(entry, frame.return_address);
+		fp = compat_ptr(frame.next_frame);
+	}
+	return 1;
 }
+#else
+static inline int
+perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+{
+    return 0;
+}
+#endif
 
 static void
 perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
@@ -2415,11 +2444,16 @@ perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
 	callchain_store(entry, PERF_CONTEXT_USER);
 	callchain_store(entry, regs->ip);
 
+	if (perf_callchain_user32(regs, entry))
+		return;
+
 	while (entry->nr < PERF_MAX_STACK_DEPTH) {
+		unsigned long bytes;
 		frame.next_frame	     = NULL;
 		frame.return_address = 0;
 
-		if (!copy_stack_frame(fp, &frame))
+		bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+		if (bytes != sizeof(frame))
 			break;
 
 		if ((unsigned long)fp < regs->sp)
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 4fd1420..3f0a242 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -29,4 +29,9 @@ struct stack_frame {
 	struct stack_frame *next_frame;
 	unsigned long return_address;
 };
+
+struct stack_frame_ia32 {
+    u32 next_frame;
+    u32 return_address;
+};
 #endif
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4.
  2010-03-17 10:07             ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4 Török Edwin
@ 2010-03-30 23:18               ` Frederic Weisbecker
  0 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2010-03-30 23:18 UTC (permalink / raw)
  To: Török Edwin
  Cc: Ingo Molnar, H. Peter Anvin, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Paul Mackerras, x86, linux-kernel

On Wed, Mar 17, 2010 at 12:07:16PM +0200, Török Edwin wrote:
> When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
> stopped after the first function, because it has seen a garbage memory address
> (tried to interpret the frame pointer, and return address as a 64-bit pointer).
> 
> Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set.
> 
> Note that TIF_IA32 flag must be used, and not is_compat_task(), because the
> latter is only set when the 32-bit process is executing a syscall,
> which may not always be the case (when tracing page fault events for example).
> 
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Török Edwin <edwintorok@gmail.com>


Queued, thanks!


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-03-31  0:07 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-15 15:34 fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin
2010-03-15 15:34 ` [PATCH] perf: x86: " Török Edwin
2010-03-16 14:49   ` Frederic Weisbecker
2010-03-16 15:02     ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Török Edwin
2010-03-16 17:05       ` Ingo Molnar
2010-03-17  8:48         ` Török Edwin
2010-03-17  8:49           ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V3 Török Edwin
2010-03-17  9:54             ` Ingo Molnar
2010-03-17 10:07             ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V4 Török Edwin
2010-03-30 23:18               ` Frederic Weisbecker
2010-03-17  9:59           ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2 Ingo Molnar
2010-03-16 15:04     ` [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels Török Edwin
2010-03-15 16:23 ` Török Edwin
2010-03-16  8:18   ` Török Edwin
2010-03-16  8:47     ` Ingo Molnar
2010-03-16 10:17       ` Török Edwin
2010-03-16 10:23         ` Ingo Molnar
2010-03-16  9:57     ` [PATCH] perf: install into /usr/local by default Török Edwin
2010-03-16 10:10       ` Ingo Molnar
2010-03-16 10:20         ` Avi Kivity
2010-03-16 10:25           ` Ingo Molnar
2010-03-16 10:24         ` Török Edwin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox