From: Steven Rostedt <rostedt@goodmis.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Andy Lutomirski <luto@amacapital.net>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Nicolai Stange <nstange@suse.de>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
"H. Peter Anvin" <hpa@zytor.com>,
"the arch/x86 maintainers" <x86@kernel.org>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Jiri Kosina <jikos@kernel.org>, Miroslav Benes <mbenes@suse.cz>,
Petr Mladek <pmladek@suse.com>,
Joe Lawrence <joe.lawrence@redhat.com>,
Shuah Khan <shuah@kernel.org>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Mimi Zohar <zohar@linux.ibm.com>, Juergen Gross <jgross@suse.com>,
Nick Desaulniers <ndesaulniers@google.com>,
Nayna Jain <nayna@linux.ibm.com>,
Masahiro Yamada <yamada.masahiro@socionext.com>,
Joerg Roedel <jroedel@suse.de>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@vger.kernel.org>,
stable <stable@vger.kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [RFC][PATCH 1/2] x86: Allow breakpoints to emulate call functions
Date: Mon, 6 May 2019 20:10:14 -0400 [thread overview]
Message-ID: <20190506201014.484e7b65@oasis.local.home> (raw)
In-Reply-To: <CAHk-=wje38dbYFGNw0y==zd7Zo_4s2WOQjWaBDyr24RCdK2EPQ@mail.gmail.com>
On Mon, 6 May 2019 15:31:57 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, May 6, 2019 at 3:06 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Why are you emulating something different than what you are rewriting?
>
> Side note: I'm also finding another bug on the ftrace side, which is a
> simple race condition.
>
> In particular, the logic with 'modifying_ftrace_code' is fundamentally racy.
>
> What can happen is that on one CPU we rewrite one instruction:
>
> ftrace_update_func = ip;
> /* Make sure the breakpoints see the ftrace_update_func update */
> smp_wmb();
>
> /* See comment above by declaration of modifying_ftrace_code */
> atomic_inc(&modifying_ftrace_code);
>
> ret = ftrace_modify_code(ip, old, new);
>
> atomic_dec(&modifying_ftrace_code);
>
> but then another CPU hits the 'int3' while the modification is
> going on, and takes the fault.
>
> The fault handler does that
>
> if (unlikely(atomic_read(&modifying_ftrace_code))..
>
> and sees that "yes, it's in the middle of modifying the ftrace code",
> and calls ftrace_int3_handler(). All good and "obviously correct" so
> far, no?
>
> HOWEVER. It's actually buggy. Because in the meantime, the CPU that
> was rewriting instructions has finished, and decrements the
> modifying_ftrace_code, which doesn't hurt us (because we already saw
> that the int3 was due to the modification.
But the CPU that was rewriting instructions does a run_sync() after
removing the int3:
static void run_sync(void)
{
int enable_irqs;
/* No need to sync if there's only one CPU */
if (num_online_cpus() == 1)
return;
enable_irqs = irqs_disabled();
/* We may be called with interrupts disabled (on bootup). */
if (enable_irqs)
local_irq_enable();
on_each_cpu(do_sync_core, NULL, 1);
if (enable_irqs)
local_irq_disable();
}
Which sends an IPI to all CPUs to make sure they no longer see the int3.
>
> BUT! There are two different races here:
>
> (a) maybe the fault handling was slow, and we saw the 'int3' and took
> the fault, but the modifying CPU had already finished, so that
> atomic_read(&modifying_ftrace_code) didn't actually trigger at all.
>
> (b) maybe the int3-faulting CPU *did* see the proper value of
> modifying_ftrace_code, but the modifying CPU went on and started
> *another* modification, and has changed ftrace_update_func in the
> meantime, so now the int3 handling is looking at the wrong values!
>
> In the case of (a), we'll die with an oops due to the inexplicable
> 'int3' we hit. And in the case of (b) we'll be fixing up using the
> wrong address.
>
> Things like this is why I'm wondering how much of the problems are due
> to the entry code, and how much of it is due to simply races and
> timing differences?
>
> Again, I don't actually know the ftrace code, and maybe I'm missing
> something, but this really looks like _another_ fundamental bug.
>
> The way to handle that modifying_ftrace_code thing is most likely by
> using a sequence counter. For example, one way to actually do some
> thing like this might be
>
> ftrace_update_func = ip;
> ftrace_update_target = func;
> smp_wmb();
> atomic_inc(&modifying_ftrace_head);
>
> ret = ftrace_modify_code(ip, old, new);
>
> atomic_inc(&modifying_ftrace_tail);
> smp_wmb();
>
> and now the int3 code could do something like
>
> int head, tail;
>
> head = atomic_read(&modifying_ftrace_head);
> smp_rmb();
> tail = atomic_read(&modifying_ftrace_tail);
>
> /* Are we still in the process of modification? */
> if (unlikely(head != tail+1))
> return 0;
>
> ip = ftrace_update_func;
> func = ftrace_update_target;
> smp_rmb();
> /* Need to re-check that the above two values are consistent
> and we didn't exit */
> if (atomic_read(&modifying_ftrace_tail) != tail)
> return 0;
>
> *pregs int3_emulate_call(regs, ip, func);
> return 1;
>
> although it probably really would be better to use a seqcount instead
> of writing it out like the above.
>
> NOTE! The above only fixes the (b) race. The (a) race is probably best
> handled by actually checking if the 'int3' instruction is still there
> before dying.
>
> Again, maybe there's something I'm missing, but having looked at that
> patch now what feels like a million times, I'm finding more worrisome
> things in the ftrace code than in the kernel entry code..
I think you are missing the run_sync() which is the heavy hammer to
make sure all CPUs are in sync. And this is done at each stage:
add int3
run_sync();
update call cite outside of int3
run_sync()
remove int3
run_sync()
HPA said that the last run_sync() isn't needed, but I kept it because I
wanted to make sure. Looks like your analysis shows that it is needed.
-- Steve
next prev parent reply other threads:[~2019-05-07 0:10 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-01 20:28 [RFC][PATCH 0/2] ftrace/x86: Allow for breakpoint handlers to emulate call functions Steven Rostedt
2019-05-01 20:28 ` [RFC][PATCH 1/2] x86: Allow breakpoints " Steven Rostedt
2019-05-02 3:24 ` Steven Rostedt
2019-05-02 16:21 ` Peter Zijlstra
2019-05-02 16:29 ` Peter Zijlstra
2019-05-02 18:02 ` Linus Torvalds
2019-05-02 18:18 ` Peter Zijlstra
2019-05-02 18:30 ` Peter Zijlstra
2019-05-02 18:43 ` Linus Torvalds
2019-05-02 19:28 ` Jiri Kosina
2019-05-02 20:25 ` Andy Lutomirski
2019-05-02 20:21 ` Peter Zijlstra
2019-05-02 20:49 ` Linus Torvalds
2019-05-02 21:32 ` Peter Zijlstra
2019-05-03 19:24 ` Steven Rostedt
2019-05-03 21:46 ` Linus Torvalds
2019-05-03 22:49 ` Steven Rostedt
2019-05-03 23:07 ` Linus Torvalds
2019-05-04 4:17 ` Steven Rostedt
[not found] ` <CAHk-=wiuSFbv_rELND-BLWcP0GSZ0yF=xOAEcf61GE3bU9d=yg@mail.gmail.com>
2019-05-04 18:59 ` Linus Torvalds
2019-05-04 20:12 ` Andy Lutomirski
2019-05-04 20:28 ` Linus Torvalds
2019-05-04 20:36 ` Linus Torvalds
2019-05-03 22:55 ` Andy Lutomirski
2019-05-03 23:16 ` Linus Torvalds
2019-05-03 23:32 ` Andy Lutomirski
2019-05-02 22:52 ` Steven Rostedt
2019-05-02 23:31 ` Steven Rostedt
2019-05-02 23:50 ` Steven Rostedt
2019-05-03 1:51 ` [RFC][PATCH 1/2 v2] " Steven Rostedt
2019-05-03 9:29 ` [RFC][PATCH 1/2] " Peter Zijlstra
2019-05-03 13:22 ` Steven Rostedt
2019-05-03 16:20 ` Andy Lutomirski
2019-05-03 16:31 ` Steven Rostedt
2019-05-03 16:35 ` Peter Zijlstra
2019-05-03 16:44 ` Andy Lutomirski
2019-05-03 16:49 ` Steven Rostedt
2019-05-03 16:32 ` Peter Zijlstra
2019-05-03 18:57 ` Linus Torvalds
2019-05-06 8:19 ` Peter Zijlstra
2019-05-06 13:56 ` Steven Rostedt
2019-05-06 16:17 ` Linus Torvalds
2019-05-06 16:19 ` Linus Torvalds
2019-05-06 17:06 ` Steven Rostedt
2019-05-06 18:06 ` Linus Torvalds
2019-05-06 18:57 ` Steven Rostedt
2019-05-06 19:46 ` Linus Torvalds
2019-05-06 20:29 ` Steven Rostedt
2019-05-06 20:42 ` Linus Torvalds
2019-05-06 20:44 ` Linus Torvalds
2019-05-06 21:45 ` Steven Rostedt
2019-05-06 22:06 ` Linus Torvalds
2019-05-06 22:31 ` Linus Torvalds
2019-05-07 0:10 ` Steven Rostedt [this message]
2019-05-07 1:06 ` Linus Torvalds
2019-05-07 1:04 ` Steven Rostedt
2019-05-07 1:34 ` Steven Rostedt
2019-05-07 1:34 ` Linus Torvalds
2019-05-07 1:53 ` Steven Rostedt
2019-05-07 2:22 ` Linus Torvalds
2019-05-07 2:58 ` Steven Rostedt
2019-05-07 3:05 ` Linus Torvalds
2019-05-07 3:21 ` Steven Rostedt
2019-05-07 3:28 ` Linus Torvalds
2019-05-07 14:54 ` Linus Torvalds
2019-05-07 15:12 ` Steven Rostedt
2019-05-07 15:25 ` Steven Rostedt
2019-05-07 16:25 ` Steven Rostedt
2019-05-07 15:31 ` Linus Torvalds
2019-05-07 15:45 ` Steven Rostedt
2019-05-07 16:34 ` Peter Zijlstra
2019-05-07 17:08 ` Linus Torvalds
2019-05-07 17:21 ` Josh Poimboeuf
2019-05-07 21:24 ` Steven Rostedt
2019-05-08 4:50 ` Linus Torvalds
2019-05-08 16:37 ` Steven Rostedt
2019-05-07 17:38 ` Peter Zijlstra
2019-05-07 9:51 ` Peter Zijlstra
2019-05-07 14:48 ` Andy Lutomirski
2019-05-07 14:57 ` Linus Torvalds
2019-05-07 14:13 ` Masami Hiramatsu
2019-05-07 17:15 ` Masami Hiramatsu
2019-05-06 14:22 ` Peter Zijlstra
2019-05-07 8:57 ` Peter Zijlstra
2019-05-07 9:18 ` David Laight
2019-05-07 11:30 ` Peter Zijlstra
2019-05-07 12:57 ` David Laight
2019-05-07 13:14 ` Steven Rostedt
2019-05-07 14:50 ` David Laight
2019-05-07 14:57 ` Steven Rostedt
2019-05-07 15:46 ` David Laight
2019-05-07 13:32 ` Peter Zijlstra
2019-05-07 9:27 ` Peter Zijlstra
2019-05-07 12:27 ` Steven Rostedt
2019-05-07 12:41 ` Peter Zijlstra
2019-05-07 12:54 ` Steven Rostedt
2019-05-07 17:22 ` Masami Hiramatsu
2019-05-07 14:28 ` Peter Zijlstra
2019-05-02 20:48 ` Steven Rostedt
2019-05-06 15:14 ` Josh Poimboeuf
2019-05-01 20:28 ` [RFC][PATCH 2/2] ftrace/x86: Emulate call function while updating in breakpoint handler Steven Rostedt
2019-05-03 10:22 ` [RFC][PATCH 1.5/2] x86: Add int3_emulate_call() selftest Peter Zijlstra
2019-05-03 18:46 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190506201014.484e7b65@oasis.local.home \
--to=rostedt@goodmis.org \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=jikos@kernel.org \
--cc=joe.lawrence@redhat.com \
--cc=jpoimboe@redhat.com \
--cc=jroedel@suse.de \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mbenes@suse.cz \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=nayna@linux.ibm.com \
--cc=ndesaulniers@google.com \
--cc=nstange@suse.de \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=shuah@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
--cc=yamada.masahiro@socionext.com \
--cc=zohar@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox