All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sven Schnelle <svens@linux.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: ftrace startup tests crashing due to missing rcu_synchronize()
Date: Wed, 16 Feb 2022 20:23:55 +0100	[thread overview]
Message-ID: <yt9dee424nuc.fsf@linux.ibm.com> (raw)
In-Reply-To: <20220216135419.01d96fe1@gandalf.local.home> (Steven Rostedt's message of "Wed, 16 Feb 2022 13:54:19 -0500")

Steven Rostedt <rostedt@goodmis.org> writes:

> On Wed, 16 Feb 2022 19:39:03 +0100
> Sven Schnelle <svens@linux.ibm.com> wrote:
>> [    4.460091] Unable to handle kernel pointer dereference in virtual kernel address space
>> [    4.460375] Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
>> [    4.460458] Fault in home space mode while using kernel ASCE.
>> [    4.460695] AS:000000008561c007 R3:0000000000000024
>> [    4.461143] Oops: 0038 ilc:3 [#1] PREEMPT SMP
>> [    4.461162] Modules linked in:
>> [    4.461175] CPU: 245 PID: 0 Comm: swapper/245 Not tainted 5.17.0-rc4-00051-gc5d9ae265b10-dirty #4
>> [    4.461183] Hardware name: IBM 8561 T01 701 (KVM/Linux)
>
> I this a 390?

Yes.

>> Looking at unregister_ftrace_function(), i noticed that
>> ftrace_shutdown() is called with 0 as command. Given that the ftrace
>> function didn't change and ftrace is still enabled, the
>> rcu_synchronize() functions in ftrace_shutdown() are silently skipped.
>> So the caller frees ops already before other CPUs have gone through
>> quiesce, and may therefore use the old (now freed) list entry.
>> 
>> To fix this, i wonder whether we should change the code in
>> unregister_ftrace_function() to:
>> 
>> @@ -7827,7 +7837,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops)
>>         int ret;
>>  
>>         mutex_lock(&ftrace_lock);
>> -       ret = ftrace_shutdown(ops, 0);
>> +       ret = ftrace_shutdown(ops, FTRACE_UPDATE_TRACE_FUNC);
>
> No, the ftrace_shutdown() will add that flag if it is needed.
>
>>         mutex_unlock(&ftrace_lock);
>>  
>>         return ret;
>> 
>> I haven't checked whether other callsites of unregister_ftrace_function()
>> also need to be adjusted. What do you think about that 'fix'?
>
> But what I'm thinking is, the function is being freed but has yet to be
> removed from the list. Or that a synchronization is missed.
>
> That is, shutdown is called, the item is removed from the list and freed,
> but something got preempted while on the ftrace trampoline, with a
> reference to the item, and then woke up and executed the item that was
> freed.
>
> I'll look into it. Thanks for the report.

With additional debugging i see:

@@ -2967,14 +2974,17 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
        }
 
        if (!command || !ftrace_enabled) {
+               pr_err("%s: skipping rcu_synchronize(): ops=%pS command=%d ftrace_enabled=%d saved func=%pS ftrace_trace_func=%pS\n",
+                      __func__, ops, command, ftrace_enabled, saved_ftrace_func, ftrace_trace_function);
                /*
                 * If these are dynamic or per_cpu ops, they still
                 * need their data freed. Since, function tracing is
                 * not currently active, we can just free them
                 * without synchronizing all CPUs.
                 */

[  +0.000011] unregister_ftrace_function: 0x2f6792e00
[  +0.000023] removing ops 00000002f6792e00 trace_selftest_test_dyn_func+0x0/0x18
[  +0.000661] ftrace_shutdown: skipping rcu_synchronize(): ops=0x2f6792e00 command=0 ftrace_enabled=1 saved func=arch_ftrace_ops_list_func+0x0/0x1b0 ft>
[  +0.010032] unregister_ftrace_function: test_probe1+0x0/0x1b0
[  +0.000017] removing ops 000000009d876e40 trace_selftest_test_probe1_func+0x0/0x18
[  +0.000053] ftrace_shutdown: skipping rcu_synchronize(): ops=test_probe1+0x0/0x1b0 command=0 ftrace_enabled=1 saved func=arch_ftrace_ops_list_func+0x>
[  +0.000017] unregister_ftrace_function: test_probe2+0x0/0x1b0
[  +0.000081] removing ops 000000009d876ff0 trace_selftest_test_probe2_func+0x0/0x18
[  +0.000064] ftrace_shutdown: skipping rcu_synchronize(): ops=test_probe2+0x0/0x1b0 command=0 ftrace_enabled=1 saved func=arch_ftrace_ops_list_func+0x>
[  +0.000015] unregister_ftrace_function: test_probe3+0x0/0x1b0
[  +0.000011] removing ops 000000009d8771a0 trace_selftest_test_probe3_func+0x0/0x18
[  +0.000108] unregister_ftrace_function: global_ops+0x0/0x1b0
[  +0.000010] removing ops 000000009d9325b8 trace_selftest_test_global_func+0x0/0x18
[  +0.025759] PASSED

So the rcu_synchronize is definitely skipped. Another thing i was
wondering was whether we need to reset the next pointer in
the to-be-removed entry in remove_ftrace_ops(). But i haven't
investigated that in detail yet.

      parent reply	other threads:[~2022-02-16 19:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16 18:39 ftrace startup tests crashing due to missing rcu_synchronize() Sven Schnelle
2022-02-16 18:54 ` Steven Rostedt
2022-02-16 18:58   ` Steven Rostedt
2022-02-16 19:31     ` Sven Schnelle
2022-02-16 19:23   ` Sven Schnelle [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yt9dee424nuc.fsf@linux.ibm.com \
    --to=svens@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.