From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758811Ab0CPU43 (ORCPT ); Tue, 16 Mar 2010 16:56:29 -0400 Received: from mail-iw0-f176.google.com ([209.85.223.176]:51007 "EHLO mail-iw0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754423Ab0CPU41 (ORCPT ); Tue, 16 Mar 2010 16:56:27 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=QzBlz+2YELAGQgIG8VpRNhUSY+BOWH52VU6VbcfVACTwNDHvPtL868VGoHzi74r622 G3EVqQ7CDC9MSUJYfn/vfNFthhGBL+MIi/aa6TK74H6BLa0SGRvck5TqkitCXZvyG8ge ikqoESBXyFUtRKleU+Jxc1D8QeiBjnIMT9i1M= Date: Tue, 16 Mar 2010 21:56:28 +0100 From: Frederic Weisbecker To: Paul Mackerras Cc: Ingo Molnar , Peter Zijlstra , benh@kernel.crashing.org, linux-kernel@vger.kernel.org, anton@samba.org, linuxppc-dev@ozlabs.org Subject: Re: [PATCH] powerpc/perf_events: Implement perf_arch_fetch_caller_regs for powerpc Message-ID: <20100316205609.GA6822@nowhere> References: <20100315054615.GA6245@drongo> <20100315210450.GF5082@nowhere> <20100316032213.GA3656@drongo> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100316032213.GA3656@drongo> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 16, 2010 at 02:22:13PM +1100, Paul Mackerras wrote: > On Mon, Mar 15, 2010 at 10:04:54PM +0100, Frederic Weisbecker wrote: > > On Mon, Mar 15, 2010 at 04:46:15PM +1100, Paul Mackerras wrote: > > > > 14.99% perf [kernel.kallsyms] [k] ._raw_spin_lock > > > | > > > --- ._raw_spin_lock > > > | > > > |--25.00%-- .alloc_fd > > > | (nil) > > > | | > > > | |--50.00%-- .anon_inode_getfd > > > | | .sys_perf_event_open > > > | | syscall_exit > > > | | syscall > > > | | create_counter > > > | | __cmd_record > > > | | run_builtin > > > | | main > > > | | 0xfd2e704 > > > | | 0xfd2e8c0 > > > | | (nil) > > > > > > ... etc. > > > > > > Signed-off-by: Paul Mackerras > > > > > > Cool! > > By the way, I notice that gcc tends to inline the tracing functions, > which means that by going up 2 stack frames we miss some of the > functions. For example, for the lock:lock_acquire event, we have > _raw_spin_lock() -> lock_acquire() -> trace_lock_acquire() -> > perf_trace_lock_acquire() -> perf_trace_templ_lock_acquire() -> > perf_fetch_caller_regs() -> perf_arch_fetch_caller_regs(). > > But in the ppc64 kernel binary I just built, gcc inlined > trace_lock_acquire in lock_acquire, and perf_trace_templ_lock_acquire > in perf_trace_lock_acquire. Given that perf_fetch_caller_regs is > explicitly inlined, going up two levels from perf_fetch_caller_regs > gets us to _raw_spin_lock, whereas I think you intended it to get us > to trace_lock_acquire. I'm not sure what to do about that - any > thoughts? Yeah I've indeed seen this, and the problem is especially the fact perf_trace_templ_lock_acquire may or may not be inlined. It is used for trace events that use the TRACE_EVENT_CLASS thing. We define a pattern of event structure that is shared among several events. For example event A and event B share perf_trace_templ_foo. Both will have a different perf_trace_blah but those perf_trace_blah will both call the same perf_trace_templ_foo(), in this case, it won't be inlined. Events that don't share a pattern will have their perf_trace_templ inlined, because there will be an exclusive 1:1 relationship between both. The rewind of 2 is well suited for events sharing a pattern, ip will match the right event source, and not one of its callers. Unfortunately, the others are more unlucky. I didn't mind much about this yet because it had no bad effect on lock events. Quite the opposite actually. It's not very interesting to have lock_acquire as the event source unless you have a callchain. If you have no callchain, you'll see a lot of such in perf report: sym1 lock_aquire sym2 lock_acquire sym3 lock_acquire What you want here is the function that called lock_acquire. But if you have a callchain it's fine, because you have the nature of the event (lock_aquire) and the origin as well. That said, lock events are an exception where the mistake has a lucky result. Other inlined events are harmed as we lose their most important caller. So I'm going to fix that. I can just fetch the regs from perf_trace_foo() and pass them to perf_trace_templ_foo() and here we are. Thanks.