From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751394AbcFWOUk (ORCPT ); Thu, 23 Jun 2016 10:20:40 -0400 Received: from mx3-phx2.redhat.com ([209.132.183.24]:42555 "EHLO mx3-phx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751141AbcFWOUe convert rfc822-to-8bit (ORCPT ); Thu, 23 Jun 2016 10:20:34 -0400 Date: Thu, 23 Jun 2016 10:20:31 -0400 (EDT) From: Chunyu Hu Reply-To: Chunyu Hu To: Steven Rostedt Cc: linux-kernel@vger.kernel.org Message-ID: <746408767.4672536.1466691631491.JavaMail.zimbra@redhat.com> In-Reply-To: <20160622090814.0bd039e3@gandalf.local.home> References: <1466506666-28804-1-git-send-email-chuhu@redhat.com> <20160622090814.0bd039e3@gandalf.local.home> Subject: Re: [PATCH] tracing: Fix oops caused by graph notrace filter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Originating-IP: [10.68.5.41] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - FF38 (Linux)/8.0.6_GA_5922) Thread-Topic: tracing: Fix oops caused by graph notrace filter Thread-Index: 7/2VAQWBqHsi/gMrDs+VHuu4goRU0g== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- Original Message ----- > From: "Steven Rostedt" > To: "Chunyu Hu" > Cc: linux-kernel@vger.kernel.org > Sent: Wednesday, June 22, 2016 9:08:14 PM > Subject: Re: [PATCH] tracing: Fix oops caused by graph notrace filter > > On Tue, 21 Jun 2016 18:57:46 +0800 > Chunyu Hu wrote: > > > wakeup tracer can use function_graph trace when display_graph trace > > option is setup by user via tracefs, and bypass the set_graph_function > > and set_graph_notrace. But the bypass of set_graph_notrace is not clean. > > Although wakeup_graph_entry does most of the bypass, and both the enry > > and exit event will be submitted to the trace ring buffer, the ret_stack > > index, which will be assigned to depth field of graph enrty event is not > > handled. The issue is that the depth is used as the array index of > > fgraph_cpu_data and can cause an oops when it's negative. irqsoff tracer > > has same issue. To see the oops: > > > > echo 1 > options/display_graph > > echo schedule > set_graph_notrace > > echo wakeup > current_tracer > > cat trace > > cat trace > > I'm not able to trigger an oops, but I do see them being traced. Actually this needs to 'cat trace' for more than two times, after collecting three graph entry events at least, as the print showed the depth was -65534. One example call trace is like this, it's usually the same call trace: [73710.161617] BUG: unable to handle kernel paging request at ffffe8fdf79836e0 [73710.168616] IP: [] print_graph_entry+0x3c4/0x3e0 [73710.174815] PGD 272759067 PUD 272758067 PMD 0 [73710.179313] Oops: 0002 [#1] SMP [73710.182454] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ocrdma ib_core intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper be2iscsi ipmi_ssif cryptd ipmi_devintf cdc_ether iscsi_boot_sysfs libiscsi sg ipmi_si scsi_transport_iscsi iTCO_wdt usbnet pcspkr iTCO_vendor_support mii ipmi_msghandler ioatdma shpchp dca lpc_ich i2c_i801 i7core_edac mfd_core edac_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_generic pata_acpi drm mptsas ata_piix scsi_transport_sas mptscsih libata crc32c_intel serio_raw be2net mptbase i2c_core bnx2 dm_mirror dm_region_hash dm_log dm_mod [73710.269323] CPU: 8 PID: 31611 Comm: cat Not tainted 4.7.0-rc3 #1 [73710.275325] Hardware name: IBM System x3550 M3 -[7944I21]-/69Y4438 , BIOS -[D6E157AUS-1.15]- 06/13/2012 [73710.285054] task: ffff88007f213d80 ti: ffff88007f238000 task.ti: ffff88007f238000 [73710.292528] RIP: 0010:[] [] print_graph_entry+0x3c4/0x3e0 [73710.301149] RSP: 0018:ffff88007f23bc68 EFLAGS: 00010286 [73710.306458] RAX: ffffe8fdf7a036c0 RBX: ffff88007f2f0000 RCX: ffffffff810ef390 [73710.313585] RDX: ffffffffffff0002 RSI: 0000000000000282 RDI: ffffffff81c81de0 [73710.320713] RBP: ffff88007f23bca8 R08: 0000000000000003 R09: ffff880444a69ea0 [73710.327839] R10: 0000000000000001 R11: 000000000000000b R12: ffff88007f23bce0 [73710.334966] R13: ffff88007f2f1098 R14: ffff880444a69ea0 R15: 0000000000000038 [73710.342094] FS: 00007fda5c740740(0000) GS:ffff880277b00000(0000) knlGS:0000000000000000 [73710.350177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [73710.355918] CR2: ffffe8fdf79836e0 CR3: 0000000033c32000 CR4: 00000000000006e0 [73710.363044] Stack: [73710.365059] ffff8802730ce400 ffff880444a69ea0 000000007f23bcc8 ffff88007f2f0000 [73710.372524] ffff88007f2f1098 0000000000000000 0000000000000038 ffff88027557698c [73710.379991] ffff88007f23bd28 ffffffff81161329 0000000000000aa8 0000000000000994 [73710.387456] Call Trace: [73710.389906] [] print_graph_function_flags+0xb9/0x500 [73710.396514] [] ? rcu_sched_qs+0x60/0x60 [73710.401998] [] wakeup_print_line+0x20/0x30 [73710.407740] [] print_trace_line+0x54/0x510 [73710.413481] [] s_show+0x29/0x150 [73710.418357] [] seq_read+0x22e/0x370 [73710.423495] [] __vfs_read+0x37/0x150 [73710.428716] [] ? __vfs_read+0x5/0x150 [73710.434027] [] ? security_file_permission+0xa3/0xc0 [73710.440550] [] ? __vfs_read+0x5/0x150 [73710.445860] [] vfs_read+0x8e/0x140 [73710.450908] [] SyS_read+0x55/0xc0 [73710.455872] [] do_syscall_64+0x62/0x110 [73710.461356] [] entry_SYSCALL64_slow_path+0x25/0x25 [73710.467791] Code: 42 18 49 89 41 38 48 8b 42 20 49 89 41 40 48 8b 42 28 49 89 41 48 4c 89 e0 e9 93 fd ff ff 89 50 08 e9 95 fc ff ff 49 8b 4c 24 08 <48> 89 4c d0 10 e9 2f ff ff ff 4c 89 e0 e9 74 fd ff ff 66 2e 0f [73710.487945] RIP [] print_graph_entry+0x3c4/0x3e0 [73710.494229] RSP [73710.497714] CR2: ffffe8fdf79836e0 > > > > Making ftrace_graph_notrace_addr always return false when tracers need > > to bypass it is a proposed fix. > > There's a lot of things wrong with this patch, but before we go there, Thanks for the attention. Agree :/. , I found one wrong, I read the code lines again, the graph_ret event won't be added to ring buffer in this case. And the trying to bypass is not so wise, I was thinking why filters were ignored in this tracer? overhead reason? Looks like it's not so. > why should the latency tracers ignore set_graph_notrace? There are two directions. I went to the wrong. If we resolve this through enabling this graph notrace filter, it's also providing a small extra function of this tracer. > What about this patch? > > -- Steve > > diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c > index 03cdff84d026..68fcee5097d2 100644 > --- a/kernel/trace/trace_irqsoff.c > +++ b/kernel/trace/trace_irqsoff.c > @@ -175,6 +175,9 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent > *trace) > int ret; > int pc; > > + if (trace->depth < 0) > + return 0; > + I tried the patch, if just for resolving the oops issue, it's indeed enough. But if we care about just this filter, and compare this with the trace_graph_entry filter handling, I found that, the depth 0 item will be submitted to the ring buffer, not respecting the filter. I still use the same steps. It's possible to show schedule entry: 419.456410 | -0 | .N.. | schedule_preempt_disabled() { 419.456410 | -0 | .N.. | schedule() { ------------------------------------------ 3) -0 => rngd-1979 ------------------------------------------ 419.456425 | rngd-1979 | d... 0.405 us | finish_task_switch(); 419.456428 | rngd-1979 | .... | __fdget() { 419.456428 | rngd-1979 | .... 0.913 us | __fget_light(); 419.456430 | rngd-1979 | .... 2.103 us | } 419.456431 | rngd-1979 | .... 0.241 us | random_poll(); 419.456433 | rngd-1979 | .... | poll_schedule_timeout() { 419.456433 | rngd-1979 | .... | schedule_hrtimeout_range() { 419.456434 | rngd-1979 | .... | schedule_hrtimeout_range_clock() { 419.456435 | rngd-1979 | .... | schedule() { 419.456444 | rngd-1979 | d... 0.000 us | __schedule(); 419.456445 | rngd-1979 | d... | /* 1979:120:S ==> [003] 1494:100:R kworker/3:1H */ And compared with trace_graph_entry function, don't we need both the two check? + if (trace->depth < 0) + return 0; + if (ftrace_graph_notrace_addr(trace->func)) + return 1; > if (!func_prolog_dec(tr, &data, &flags)) > return 0; > > diff --git a/kernel/trace/trace_sched_wakeup.c > b/kernel/trace/trace_sched_wakeup.c > index 9d4399b553a3..ce9889bff8b4 100644 > --- a/kernel/trace/trace_sched_wakeup.c > +++ b/kernel/trace/trace_sched_wakeup.c > @@ -239,6 +239,9 @@ static int wakeup_graph_entry(struct ftrace_graph_ent > *trace) > unsigned long flags; > int pc, ret = 0; > > + if (trace->depth < 0) > + return 0; > + Do't we need this check? + if (ftrace_graph_notrace_addr(trace->func)) + return 1; > if (!func_prolog_preempt_disable(tr, &data, &pc)) > return 0; > > -- Regards, Chunyu Hu