From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753294AbcCYDZR (ORCPT ); Thu, 24 Mar 2016 23:25:17 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:47819 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751337AbcCYDZP (ORCPT ); Thu, 24 Mar 2016 23:25:15 -0400 X-Sasl-enc: vk1owmejn171QaMbLFHrVAVr41VWJ+861WvD88e0KXzW 1458876313 User-agent: mu4e 0.9.11; emacs 25.0.92.1 From: Dima Kogan To: linux-kernel@vger.kernel.org Cc: Arnaldo Carvalho de Melo Subject: Perf misreports userspace args to unoptimized functions Date: Thu, 24 Mar 2016 20:25:11 -0700 Message-ID: <87egaz1boo.fsf@secretsauce.net> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi. I'm seeing an issue with the way perf interfaces with uprobes that results in incorrect printing of function arguments under fairly normal conditions. Example: Let's say I have a trivial C program tst.c: ============================= #include #include int f(int x) { return !printf("%d\n", x); } int main(int argc, char* argv[]) { return f(argc); } ============================== I ask perf to trap all calls to f() and to give me the value of the x argument: ============================== $ gcc-5 -g -o tst tst.c && perf probe -x tst --add 'f x' && perf record -eprobe_tst:f ./tst 2 3 4 && perf script Added new event: probe_tst:f (on f in /tmp/tst with x) You can now use it in all perf tools, such as: perf record -e probe_tst:f -aR sleep 1 4 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (1 samples) ] tst 24626 [003] 98586.485680: probe_tst:f: (4004e6) x=0 ============================== Note that the value passed to f() was 4, but perf reported it as 0 instead. If I look at what uprobes was actually asked to report, I see this: ============================== $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_tst/f /tmp/tst:0x00000000000004e6 x=-12(%sp):s32 ============================== The corresponding disassembly is: ============================== $ objdump -d tst | awk '/:/,/^$/' 00000000004004e6 : 4004e6: 55 push %rbp 4004e7: 48 89 e5 mov %rsp,%rbp 4004ea: 48 83 ec 10 sub $0x10,%rsp 4004ee: 89 7d fc mov %edi,-0x4(%rbp) 4004f1: 8b 45 fc mov -0x4(%rbp),%eax 4004f4: 89 c6 mov %eax,%esi 4004f6: bf b4 05 40 00 mov $0x4005b4,%edi 4004fb: b8 00 00 00 00 mov $0x0,%eax 400500: e8 bb fe ff ff callq 4003c0 400505: 85 c0 test %eax,%eax 400507: 0f 94 c0 sete %al 40050a: 0f b6 c0 movzbl %al,%eax 40050d: c9 leaveq 40050e: c3 retq ============================== So uprobes was looking at the argument as a local stack variable. However the trap was placed at the start of the function, where the stack variable wasn't yet available (the argument is still in %di). This doesn't happen with optimized code, because (at least in this simple example) the variable is simply kept in %di, the DWARF data indicates this, and perf picks that up. Any particular thoughts about how to solve this? Options are 1. place the trace after the local variables are set (4004f1 in this example) 2. Look at %di instead of the stack variable Is there enough information in the DWARF to know to do either of these? I didn't see anything obvious. Is either option reliable? I.e. could the optimizer break them? Thanks dima