From: Sven Schnelle <svens@linux.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: ftrace startup tests crashing due to missing rcu_synchronize()
Date: Wed, 16 Feb 2022 19:39:03 +0100 [thread overview]
Message-ID: <yt9dilte4px4.fsf@linux.ibm.com> (raw)
Hi Steve,
in our CI we see the ftrace selftests crashing. It only happens
in rare cases - for me i can boot the kernel hundred of times and
it crashes only once. The oops output looks like this:
[ 3.523720] Running postponed tracer tests:
[ 3.524184] Testing tracer function: PASSED
[ 3.801523] Testing dynamic ftrace: PASSED
[ 4.068009] Testing dynamic ftrace ops #1:
[ 4.140142] (1 0 1 0 0)
[ 4.140149] (1 1 2 0 0)
[ 4.200037] (2 1 3 0 822270)
[ 4.200046] (2 2 4 0 822402)
[ 4.232728] (3 2 4 0 1147236)
[ 4.232740] (3 3 5 0 1147377)
[ 4.300835] PASSED
[ 4.300840] Testing dynamic ftrace ops #2:
[ 4.420423] (1 0 1 582267 0)
[ 4.420474] (1 1 2 583259 0)
[ 4.420941] (2 1 3 1 5)
[ 4.420947] (2 2 4 69 73)
[ 4.458483] (3 2 4 494199 491341)
[ 4.458616] (3 3 5 497496 494474)
[ 4.460091] Unable to handle kernel pointer dereference in virtual kernel address space
[ 4.460375] Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
[ 4.460458] Fault in home space mode while using kernel ASCE.
[ 4.460695] AS:000000008561c007 R3:0000000000000024
[ 4.461143] Oops: 0038 ilc:3 [#1] PREEMPT SMP
[ 4.461162] Modules linked in:
[ 4.461175] CPU: 245 PID: 0 Comm: swapper/245 Not tainted 5.17.0-rc4-00051-gc5d9ae265b10-dirty #4
[ 4.461183] Hardware name: IBM 8561 T01 701 (KVM/Linux)
[ 4.461194] Krnl PSW : 0404e00180000000 00000000835a1fe6 (arch_ftrace_ops_list_func+0x96/0x1b0)
[ 4.461212] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 4.461226] Krnl GPRS: 0000000000000000 0000000084bbc110 0000000083496f48 00000000834061f2
[ 4.461237] 000000008a70fa00 0000038000d0fba8 0000000000000001 fffffffffffffeff
[ 4.461243] 00000000834061f2 0000038000d0fba8 6b6b6b6b6b6b6b6b 0000000083496f48
[ 4.461250] 0000000000000000 0000000000000000 00000000835a209a 0000038000d0fab8
[ 4.461269] Krnl Code: 00000000835a1fd4: e310a9380224 stg %r1,10552(%r10)
00000000835a1fda: eb0103a8006a asi 936,1
#00000000835a1fe0: c4a800b0d094 lgrl %r10,0000000084bbc108
>00000000835a1fe6: e310a0100004 lg %r1,16(%r10)
00000000835a1fec: a7110020 tmll %r1,32
00000000835a1ff0: a7740012 brc 7,00000000835a2014
00000000835a1ff4: a7114000 tmll %r1,16384
00000000835a1ff8: a7740031 brc 7,00000000835a205a
[ 4.461383] Call Trace:
[ 4.461388] [<00000000835a1fe6>] arch_ftrace_ops_list_func+0x96/0x1b0
[ 4.461395] ([<00000000835a200e>] arch_ftrace_ops_list_func+0xbe/0x1b0)
[ 4.461401] [<00000000841700ec>] ftrace_common+0x1c/0x20
[ 4.461409] [<0000000083496f4e>] preempt_count_sub+0x6/0x138
[ 4.461418] [<00000000834061f2>] read_tod_clock+0x4a/0xb0
[ 4.461431] [<0000000083530818>] ktime_get+0xb8/0x178
[ 4.461441] [<000000008353ec94>] clockevents_program_event+0x6c/0x138
[ 4.461454] [<000000008352d206>] hrtimer_start_range_ns+0xc6/0x100
[ 4.461467] [<0000000083541068>] tick_nohz_idle_stop_tick+0x90/0xf8
[ 4.461479] [<00000000834a9790>] do_idle+0xf0/0x1b0
[ 4.461495] [<00000000834a9ad6>] cpu_startup_entry+0x36/0x40
[ 4.461509] [<000000008416e48e>] restart_int_handler+0x6e/0x90
[ 4.461521] INFO: lockdep is turned off.
[ 4.461527] Last Breaking-Event-Address:
[ 4.461533] [<00000000835a2026>] arch_ftrace_ops_list_func+0xd6/0x1b0
[ 4.461552] Kernel panic - not syncing: Fatal exception: panic_on_oops
The code in question is the ops loop in __ftrace_ops_list_func():
do_for_each_ftrace_op(op, ftrace_ops_list) {
/* Stub functions don't need to be called nor tested */
if (op->flags & FTRACE_OPS_FL_STUB)
continue;
...
The register (r10) where op is stored contains 0x6b6b6b6b6b6b6b6b
(POISON_FREE), so it looks like the list contains a member that is
already freed.
Looking at unregister_ftrace_function(), i noticed that
ftrace_shutdown() is called with 0 as command. Given that the ftrace
function didn't change and ftrace is still enabled, the
rcu_synchronize() functions in ftrace_shutdown() are silently skipped.
So the caller frees ops already before other CPUs have gone through
quiesce, and may therefore use the old (now freed) list entry.
To fix this, i wonder whether we should change the code in
unregister_ftrace_function() to:
@@ -7827,7 +7837,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops)
int ret;
mutex_lock(&ftrace_lock);
- ret = ftrace_shutdown(ops, 0);
+ ret = ftrace_shutdown(ops, FTRACE_UPDATE_TRACE_FUNC);
mutex_unlock(&ftrace_lock);
return ret;
I haven't checked whether other callsites of unregister_ftrace_function()
also need to be adjusted. What do you think about that 'fix'?
Thanks,
Sven
next reply other threads:[~2022-02-16 18:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-16 18:39 Sven Schnelle [this message]
2022-02-16 18:54 ` ftrace startup tests crashing due to missing rcu_synchronize() Steven Rostedt
2022-02-16 18:58 ` Steven Rostedt
2022-02-16 19:31 ` Sven Schnelle
2022-02-16 19:23 ` Sven Schnelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yt9dilte4px4.fsf@linux.ibm.com \
--to=svens@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.