Re: [RFC PATCH 1/1] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nicolai Stange <nstange@suse.de>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Nicolai Stange <nstange@suse.de>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Jiri Kosina <jikos@kernel.org>, Miroslav Benes <mbenes@suse.cz>,
	Petr Mladek <pmladek@suse.com>,
	live-patching@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	Andy Lutomirski <luto@amacapital.net>
Subject: Re: [RFC PATCH 1/1] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
Date: Tue, 23 Apr 2019 20:15:49 +0200	[thread overview]
Message-ID: <87ef5sfuzu.fsf@suse.de> (raw)
In-Reply-To: <20190419160513.1faa01d2@gandalf.local.home> (Steven Rostedt's message of "Fri, 19 Apr 2019 16:05:13 -0400")

Hi Steven,

many thanks for having a look!

Steven Rostedt <rostedt@goodmis.org> writes:

> I just found this in my Inbox, and this looks to be a legit issue.
>
> On Thu, 26 Jul 2018 12:40:29 +0200
> Nicolai Stange <nstange@suse.de> wrote:
>
> You still working on this?

Yes, this still needs to get fixed somehow, preferably at the ftrace
layer.

>
>> With dynamic ftrace, ftrace patches call sites in a three steps:
>> 1. Put a breakpoint at the to be patched location,
>> 2. update call site and
>> 3. finally remove the breakpoint again.
>> 
>> Note that the breakpoint ftrace_int3_handler() simply makes execution
>> to skip over the to be patched function.
>> 
>> This patching happens in the following circumstances:
>> - the global ftrace_trace_function changes and the call sites at
>>   ftrace_call and ftrace_regs_call get patched,
>> - an ftrace_ops' ->func changes and the call site in its trampoline gets
>>   patched,
>> - graph tracing gets enabled/disabled and the jump site at
>>   ftrace_graph_call gets patched
>> - a mcount site gets converted from nop -> call, call -> nop
>>   or call -> call.
>> 
>> The latter case, i.e. a mcount call getting redirected, happens for example
>> in a transition from trampolined to not trampolined upon a user enabling
>> function tracing on a live patched function.
>> 
>> The ftrace_int3_handler() simply skipping over the mcount callsite is
>> problematic here, because it means that a live patched function gets
>> temporarily reverted to its unpatched original and this breaks live
>> patching consistency.
>> 
>> Make ftrace_int3_handler not to skip over the fops invocation, but modify
>> the interrupted control flow to issue a call as needed.
>> 
>> For determining what the proper action actually is, modify
>> update_ftrace_func() to take an extra argument, func, to be called if the
>> corresponding breakpoint gets hit. Introduce a new global variable,
>> trace_update_func_dest and let update_ftrace_func() set it. For the special
>> case of patching the jmp insn at ftrace_graph_call, set it to zero and make
>> ftrace_int3_handler() just skip over the insn in this case as before.
>> 
>> Because there's no space left above the int3 handler's iret frame to stash
>> an extra call's return address in, this can't be mimicked from the
>> ftrace_int3_handler() itself.
>> 
>> Instead, make ftrace_int3_handler() change the iret frame's %rip slot to
>> point to the new ftrace_int3_call_trampoline to be executed immediately
>> after iret.
>> 
>> The original %rip gets communicated to ftrace_int3_call_trampoline which
>> can then take proper action. Note that ftrace_int3_call_trampoline can
>> nest, because of NMIs, for example. In order to avoid introducing another
>> stack, abuse %r11 for passing the original %rip. This is safe, because the
>> interrupted context is always at a call insn and according to the x86_64
>> ELF ABI allows %r11 is callee-clobbered. According to the glibc sources,
>> this is also true for the somewhat special mcount/fentry ABI.
>> 
>> OTOH, a spare register doesn't exist on i686 and thus, this is approach is
>> limited to x86_64.
>> 
>> For determining the call target address, let ftrace_int3_call_trampoline
>> compare ftrace_update_func against the interrupted %rip.
>
> I don't think we need to do the compare.
>
>> If they match, an update_ftrace_func() is currently pending and the
>> call site is either of ftrace_call, ftrace_regs_call or the call insn
>> within a fops' trampoline. Jump to the ftrace_update_func_dest location as
>> set by update_ftrace_func().
>> 
>> If OTOH the interrupted %rip doesn't equal ftrace_update_func, then
>> it points to an mcount call site. In this case, redirect control flow to
>> the most generic handler, ftrace_regs_caller, which will then do the right
>> thing.
>> 
>> Finally, reading ftrace_update_func and ftrace_update_func_dest from
>> outside of the int3 handler requires some care: inbetween adding and
>> removing the breakpoints, ftrace invokes run_sync() which basically
>> issues a couple of IPIs. Because the int3 handler has interrupts disabled,
>> it orders with run_sync().
>> 
>> To extend that ordering to also include ftrace_int3_call_trampoline, make
>> ftrace_int3_handler() disable interrupts within the iret frame returning to
>> it.
>> 
>> Store the original EFLAGS.IF into %r11's MSB which, because it represents
>> a kernel address, is redundant. Make ftrace_int3_call_trampoline restore
>> it when done.
>
> This can be made much simpler by making a hardcoded ftrace_int3_tramp
> that does the following:
>
> ftrace_int3_tramp
> 	push	%r11
> 	jmp	ftrace_caller


But wouldn't this mean that ftrace_caller could nest if the breakpoint
in question happened to be placed at ftrace_call? Infinite recursion set
aside, the ip value determined from inner calls based on the on-stack
return address would be wrong, AFAICS. Or am I missing something here?


> The ftrace_caller will either call a single ftrace callback, if there's
> only a single ops registered, or it will call the loop function that
> iterates over all the ftrace_ops registered and will call each function
> that matches the ftrace_ops hashes.
>
> In either case, it's what we want.

Ok, under the assumption that my above point is valid, this patch could
still get simplified a lot by having two trampolines:

1.) Your ftrace_int3_tramp from above, to be used if the patched insn is
    some mcount call site. The live patching fops will need
    ftrace_regs_caller though. So,

	ftrace_int3_tramp_regs_caller:
		push %r11
                jmp ftrace_regs_caller

2.) Another one redirecting control flow to ftrace_ops_list_func(). It's
    going to be used when the int3 is found at ftrace_call or
    ftrace_regs_call resp. their counterparts in some ftrace_ops'
    trampoline.

	ftrace_int3_tramp_list_func:
		push %r11
                jmp ftrace_ops_list_func

ftrace_int3_handler() would then distinguish the following cases:
1.) ip == ftrace_graph_call => ignore, i.e. skip the insn
2.) is_ftrace_caller(ip) => ftrace_int3_tramp_list_func
3.) ftrace_location(ip) => ftrace_int3_tramp_regs_caller

ftrace_location() is getting invoked from ftrace_int3_handler() already,
so there wouldn't be any additional cost.

If that makes sense to you, I'd prepare a patch...


> The ftrace_int3_tramp will simply simulate the call ftrace_caller that
> would be there as the default caller if more than one function is set
> to it.
>
> For 32 bit, we could add 4 variables on the thread_info and make 4
> trampolines, one for each context (normal, softirq, irq and NMI), and
> have them use the variable stored in the thread_info for that location:
>
> ftrace_int3_tramp_normal
> 	push %eax # just for space
> 	push %eax
> 	mov whatever_to_get_thread_info, %eax
> 	mov normal(%eax), %eax
> 	mov %eax, 4(%esp)
> 	pop %eax
> 	jmp ftrace_caller
>
> ftrace_int3_tramp_softiqr
> 	push %eax # just for space
> 	push %eax
> 	mov whatever_to_get_thread_info, %eax
> 	mov softirq(%eax), %eax
> 	mov %eax, 4(%esp)
> 	pop %eax
> 	jmp ftrace_caller
>
> etc..
>
>
> A bit crazier for 32 bit but it can still be done. ;-)

Ok, but currently CONFIG_HAVE_LIVEPATCH=n for x86 && !x86_64,
so I'd rather not invest too much energy into screwing 32 bit up ;)

Thanks!

Nicolai


-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

next prev parent reply	other threads:[~2019-04-23 18:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-26 10:40 [RFC PATCH 0/1] x86/ftrace: fix live patching vs. tracing race Nicolai Stange
2018-07-26 10:40 ` [RFC PATCH 1/1] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
2019-04-19 20:05   ` Steven Rostedt
2019-04-23 18:15     ` Nicolai Stange [this message]
2019-04-23 23:50       ` Steven Rostedt
2019-04-24  6:20         ` Nicolai Stange
2019-04-24 12:35           ` Steven Rostedt
2018-07-26 14:23 ` [RFC PATCH 0/1] x86/ftrace: fix live patching vs. tracing race Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ef5sfuzu.fsf@suse.de \
    --to=nstange@suse.de \
    --cc=hpa@zytor.com \
    --cc=jikos@kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mbenes@suse.cz \
    --cc=mingo@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.