From: Chunyu Hu <chuhu@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] tracing: Fix oops caused by graph notrace filter
Date: Thu, 23 Jun 2016 10:20:31 -0400 (EDT) [thread overview]
Message-ID: <746408767.4672536.1466691631491.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160622090814.0bd039e3@gandalf.local.home>
----- Original Message -----
> From: "Steven Rostedt" <rostedt@goodmis.org>
> To: "Chunyu Hu" <chuhu@redhat.com>
> Cc: linux-kernel@vger.kernel.org
> Sent: Wednesday, June 22, 2016 9:08:14 PM
> Subject: Re: [PATCH] tracing: Fix oops caused by graph notrace filter
>
> On Tue, 21 Jun 2016 18:57:46 +0800
> Chunyu Hu <chuhu@redhat.com> wrote:
>
> > wakeup tracer can use function_graph trace when display_graph trace
> > option is setup by user via tracefs, and bypass the set_graph_function
> > and set_graph_notrace. But the bypass of set_graph_notrace is not clean.
> > Although wakeup_graph_entry does most of the bypass, and both the enry
> > and exit event will be submitted to the trace ring buffer, the ret_stack
> > index, which will be assigned to depth field of graph enrty event is not
> > handled. The issue is that the depth is used as the array index of
> > fgraph_cpu_data and can cause an oops when it's negative. irqsoff tracer
> > has same issue. To see the oops:
> >
> > echo 1 > options/display_graph
> > echo schedule > set_graph_notrace
> > echo wakeup > current_tracer
> > cat trace
> > cat trace
>
> I'm not able to trigger an oops, but I do see them being traced.
Actually this needs to 'cat trace' for more than two times, after collecting
three graph entry events at least, as the print showed the depth was
-65534. One example call trace is like this, it's usually the same call trace:
[73710.161617] BUG: unable to handle kernel paging request at ffffe8fdf79836e0
[73710.168616] IP: [<ffffffff81161254>] print_graph_entry+0x3c4/0x3e0
[73710.174815] PGD 272759067 PUD 272758067 PMD 0
[73710.179313] Oops: 0002 [#1] SMP
[73710.182454] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ocrdma ib_core intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper be2iscsi ipmi_ssif cryptd ipmi_devintf cdc_ether iscsi_boot_sysfs libiscsi sg ipmi_si scsi_transport_iscsi iTCO_wdt usbnet pcspkr iTCO_vendor_support mii ipmi_msghandler ioatdma shpchp dca lpc_ich i2c_i801 i7core_edac mfd_core edac_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_generic pata_acpi drm mptsas ata_piix scsi_transport_sas mptscsih libata crc32c_intel serio_raw be2net mptbase i2c_core bnx2 dm_mirror dm_region_hash dm_log dm_mod
[73710.269323] CPU: 8 PID: 31611 Comm: cat Not tainted 4.7.0-rc3 #1
[73710.275325] Hardware name: IBM System x3550 M3 -[7944I21]-/69Y4438 , BIOS -[D6E157AUS-1.15]- 06/13/2012
[73710.285054] task: ffff88007f213d80 ti: ffff88007f238000 task.ti: ffff88007f238000
[73710.292528] RIP: 0010:[<ffffffff81161254>] [<ffffffff81161254>] print_graph_entry+0x3c4/0x3e0
[73710.301149] RSP: 0018:ffff88007f23bc68 EFLAGS: 00010286
[73710.306458] RAX: ffffe8fdf7a036c0 RBX: ffff88007f2f0000 RCX: ffffffff810ef390
[73710.313585] RDX: ffffffffffff0002 RSI: 0000000000000282 RDI: ffffffff81c81de0
[73710.320713] RBP: ffff88007f23bca8 R08: 0000000000000003 R09: ffff880444a69ea0
[73710.327839] R10: 0000000000000001 R11: 000000000000000b R12: ffff88007f23bce0
[73710.334966] R13: ffff88007f2f1098 R14: ffff880444a69ea0 R15: 0000000000000038
[73710.342094] FS: 00007fda5c740740(0000) GS:ffff880277b00000(0000) knlGS:0000000000000000
[73710.350177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[73710.355918] CR2: ffffe8fdf79836e0 CR3: 0000000033c32000 CR4: 00000000000006e0
[73710.363044] Stack:
[73710.365059] ffff8802730ce400 ffff880444a69ea0 000000007f23bcc8 ffff88007f2f0000
[73710.372524] ffff88007f2f1098 0000000000000000 0000000000000038 ffff88027557698c
[73710.379991] ffff88007f23bd28 ffffffff81161329 0000000000000aa8 0000000000000994
[73710.387456] Call Trace:
[73710.389906] [<ffffffff81161329>] print_graph_function_flags+0xb9/0x500
[73710.396514] [<ffffffff810ef390>] ? rcu_sched_qs+0x60/0x60
[73710.401998] [<ffffffff8115ea10>] wakeup_print_line+0x20/0x30
[73710.407740] [<ffffffff81158524>] print_trace_line+0x54/0x510
[73710.413481] [<ffffffff81159389>] s_show+0x29/0x150
[73710.418357] [<ffffffff8123ddbe>] seq_read+0x22e/0x370
[73710.423495] [<ffffffff81218b67>] __vfs_read+0x37/0x150
[73710.428716] [<ffffffff81218b35>] ? __vfs_read+0x5/0x150
[73710.434027] [<ffffffff812c9413>] ? security_file_permission+0xa3/0xc0
[73710.440550] [<ffffffff81218b35>] ? __vfs_read+0x5/0x150
[73710.445860] [<ffffffff8121912e>] vfs_read+0x8e/0x140
[73710.450908] [<ffffffff8121a625>] SyS_read+0x55/0xc0
[73710.455872] [<ffffffff81003b12>] do_syscall_64+0x62/0x110
[73710.461356] [<ffffffff816c65a1>] entry_SYSCALL64_slow_path+0x25/0x25
[73710.467791] Code: 42 18 49 89 41 38 48 8b 42 20 49 89 41 40 48 8b 42 28 49 89 41 48 4c 89 e0 e9 93 fd ff ff 89 50 08 e9 95 fc ff ff 49 8b 4c 24 08 <48> 89 4c d0 10 e9 2f ff ff ff 4c 89 e0 e9 74 fd ff ff 66 2e 0f
[73710.487945] RIP [<ffffffff81161254>] print_graph_entry+0x3c4/0x3e0
[73710.494229] RSP <ffff88007f23bc68>
[73710.497714] CR2: ffffe8fdf79836e0
> >
> > Making ftrace_graph_notrace_addr always return false when tracers need
> > to bypass it is a proposed fix.
>
> There's a lot of things wrong with this patch, but before we go there,
Thanks for the attention. Agree :/. , I found one wrong, I read the code
lines again, the graph_ret event won't be added to ring buffer in this case.
And the trying to bypass is not so wise, I was thinking why filters were
ignored in this tracer? overhead reason? Looks like it's not so.
> why should the latency tracers ignore set_graph_notrace?
There are two directions. I went to the wrong. If we resolve this through
enabling this graph notrace filter, it's also providing a small extra function
of this tracer.
> What about this patch?
>
> -- Steve
>
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index 03cdff84d026..68fcee5097d2 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -175,6 +175,9 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent
> *trace)
> int ret;
> int pc;
>
> + if (trace->depth < 0)
> + return 0;
> +
I tried the patch, if just for resolving the oops issue, it's indeed enough.
But if we care about just this filter, and compare this with the trace_graph_entry
filter handling, I found that, the depth 0 item will be submitted to the ring buffer, not
respecting the filter. I still use the same steps. It's possible to show schedule entry:
419.456410 | <idle>-0 | .N.. | schedule_preempt_disabled() {
419.456410 | <idle>-0 | .N.. | schedule() {
------------------------------------------
3) <idle>-0 => rngd-1979
------------------------------------------
419.456425 | rngd-1979 | d... 0.405 us | finish_task_switch();
419.456428 | rngd-1979 | .... | __fdget() {
419.456428 | rngd-1979 | .... 0.913 us | __fget_light();
419.456430 | rngd-1979 | .... 2.103 us | }
419.456431 | rngd-1979 | .... 0.241 us | random_poll();
419.456433 | rngd-1979 | .... | poll_schedule_timeout() {
419.456433 | rngd-1979 | .... | schedule_hrtimeout_range() {
419.456434 | rngd-1979 | .... | schedule_hrtimeout_range_clock() {
419.456435 | rngd-1979 | .... | schedule() {
419.456444 | rngd-1979 | d... 0.000 us | __schedule();
419.456445 | rngd-1979 | d... | /* 1979:120:S ==> [003] 1494:100:R kworker/3:1H */
And compared with trace_graph_entry function, don't we need both the two check?
+ if (trace->depth < 0)
+ return 0;
+ if (ftrace_graph_notrace_addr(trace->func))
+ return 1;
> if (!func_prolog_dec(tr, &data, &flags))
> return 0;
>
> diff --git a/kernel/trace/trace_sched_wakeup.c
> b/kernel/trace/trace_sched_wakeup.c
> index 9d4399b553a3..ce9889bff8b4 100644
> --- a/kernel/trace/trace_sched_wakeup.c
> +++ b/kernel/trace/trace_sched_wakeup.c
> @@ -239,6 +239,9 @@ static int wakeup_graph_entry(struct ftrace_graph_ent
> *trace)
> unsigned long flags;
> int pc, ret = 0;
>
> + if (trace->depth < 0)
> + return 0;
> +
Do't we need this check?
+ if (ftrace_graph_notrace_addr(trace->func))
+ return 1;
> if (!func_prolog_preempt_disable(tr, &data, &pc))
> return 0;
>
>
--
Regards,
Chunyu Hu
next prev parent reply other threads:[~2016-06-23 14:20 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-21 10:57 [PATCH] tracing: Fix oops caused by graph notrace filter Chunyu Hu
2016-06-21 12:22 ` kbuild test robot
2016-06-21 12:39 ` kbuild test robot
2016-06-22 5:33 ` Chunyu Hu
2016-06-22 13:08 ` Steven Rostedt
2016-06-23 14:20 ` Chunyu Hu [this message]
2016-06-23 14:39 ` Steven Rostedt
2016-06-23 14:58 ` Chunyu Hu
2016-06-23 15:11 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=746408767.4672536.1466691631491.JavaMail.zimbra@redhat.com \
--to=chuhu@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.