From: Naveen N Rao <naveen@kernel.org>
To: Christophe Leroy <christophe.leroy@csgroup.eu>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Subject: Re: [RFC PATCH] powerpc/ftrace: Refactoring and support for -fpatchable-function-entry
Date: Sat, 20 May 2023 23:47:22 +0530 [thread overview]
Message-ID: <1684605928.yl2udzpst9.naveen@kernel.org> (raw)
In-Reply-To: <85460820-e5e0-57e3-68a7-dd7a562c9eb0@csgroup.eu>
Christophe Leroy wrote:
>
>
> Le 20/05/2023 à 12:34, Christophe Leroy a écrit :
>>
>>
>> Le 19/05/2023 à 21:26, Naveen N Rao a écrit :
>>> [Vous ne recevez pas souvent de courriers de naveen@kernel.org.
>>> Découvrez pourquoi ceci est important à
>>> https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> Refactor ftrace code and move to using ftrace_replace_code() to help
>>> simplify and make the code more maintainable.
>>>
>>> - The existing ftrace.c code is moved to a separate file so that ppc64
>>> elfv1 and clang -pg only support continue. This makes it possible to
>>> converge ppc32 and ppc64 support further.
>>> - Drop code to re-purpose compiler-generated long branches for ftrace
>>> use in support of large kernels. We still retain the ftrace stubs at
>>> the end of .text, so we now support kernels upto ~64MB.
>>> - Add ftrace_init_nop() to keep boot-time validations and init separate
>>> from runtime.
>>> - Implement ftrace_replace_code() to simplify overall ftrace setup. This
>>> will be especially useful when adding ability to nop out 'mflr r0'
>>> later, and for other subsequent ftrace features.
>>> - Add support for -fpatchable-function-entry. On ppc64, this needs gcc
>>> v13.1 so that the nops are generated at LEP. This also moves ppc32 to
>>> using the same two-instruction sequence as that of ppc64.
>>>
>>> This applies atop patches 1-3 of Nick's series for elfv2 conversion, as
>>> well as Nick's patch enabling -mprofile-kernel for elfv2 BE:
>>> - https://lore.kernel.org/all/20230505071850.228734-1-npiggin@gmail.com/
>>> - https://lore.kernel.org/all/20230506011814.8766-1-npiggin@gmail.com/
>>>
>>> This builds for me and passes a quick test, posting this as an early
>>> RFC.
>>>
>>> Signed-off-by: Naveen N Rao <naveen@kernel.org>
>>
>> Looks good, works on PPC32 but I observed some performance degradation,
>> around 25% more time needed to activate function tracer and around 10%
>> more time needed to de-activate function tracer (by writting
>> function/nop into /sys/kernel/debug/tracing/current_tracer.
Thanks for the test!
I hadn't looked at the performance, though I was expecting it to be
better. On ppc64, I am actually not seeing much of a difference.
>
>
> perf record with your patch applied:
>
> 20.59% echo [kernel.kallsyms] [k] ftrace_check_record
> 15.71% echo [kernel.kallsyms] [k] patch_instruction
> 6.75% echo [kernel.kallsyms] [k] ftrace_replace_code
> 4.30% echo [kernel.kallsyms] [k] __ftrace_hash_rec_update
> 3.96% echo [kernel.kallsyms] [k]
> __rb_reserve_next.constprop.0
> 3.20% echo [kernel.kallsyms] [k] ftrace_get_call_inst.isra.0
> 2.62% echo [kernel.kallsyms] [k] ftrace_get_addr_new
> 2.44% echo [kernel.kallsyms] [k] ftrace_rec_iter_next
> 2.15% echo [kernel.kallsyms] [k] function_trace_call
> 2.09% echo [kernel.kallsyms] [k] rb_commit
> 1.92% echo [kernel.kallsyms] [k] ring_buffer_unlock_commit
> 1.69% echo [kernel.kallsyms] [k] ring_buffer_lock_reserve
> 1.63% echo [kernel.kallsyms] [k] copy_page
> 1.45% echo [kernel.kallsyms] [k]
> ftrace_create_branch_inst.constprop.0
> 1.40% echo [kernel.kallsyms] [k] unmap_page_range
> 1.34% echo [kernel.kallsyms] [k] mas_next_entry
> 1.28% echo ld-2.23.so [.] do_lookup_x
> 1.22% echo [kernel.kallsyms] [k] ftrace_call
> 1.05% echo [kernel.kallsyms] [k] trace_function
> 0.99% echo [kernel.kallsyms] [k] ftrace_caller
> 0.81% echo [kernel.kallsyms] [k] ftrace_rec_iter_record
>
> perf record without your patch:
>
> 22.58% echo [kernel.kallsyms] [k] patch_instruction
> 17.85% echo [kernel.kallsyms] [k] ftrace_check_record
> 11.65% echo [kernel.kallsyms] [k] ftrace_replace_code
> 6.76% echo [kernel.kallsyms] [k] ftrace_make_call
> 6.68% echo [kernel.kallsyms] [k] __ftrace_hash_rec_update
> 3.50% echo [kernel.kallsyms] [k] ftrace_get_addr_curr
> 3.42% echo [kernel.kallsyms] [k] ftrace_get_addr_new
> 2.36% echo [kernel.kallsyms] [k] copy_page
> 1.22% echo [kernel.kallsyms] [k] __rb_reserve_next.constprop.0
> 1.22% echo ld-2.23.so [.] do_lookup_x
> 1.06% echo [kernel.kallsyms] [k] ftrace_lookup_ip
> 0.73% echo ld-2.23.so [.] _dl_relocate_object
> 0.65% echo [kernel.kallsyms] [k] flush_dcache_icache_page
> 0.65% echo [kernel.kallsyms] [k] function_trace_call
That suggests ftrace_test_record() as the likely cause. The below change
does improve performance on ppc64. Can you see if it makes a difference
on ppc32?
Upstream/before the below change (ftrace activation):
0.15266 +- 0.00215 seconds time elapsed ( +- 1.41% )
With the below change:
0.14170 +- 0.00396 seconds time elapsed ( +- 2.79% )
- Naveen
---
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index a9d57f338bd78e..8b2096ec77bba2 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -167,23 +167,22 @@ void ftrace_replace_code(int enable)
for_ftrace_rec_iter(iter) {
rec = ftrace_rec_iter_record(iter);
- update = ftrace_test_record(rec, enable);
ip = rec->ip;
- new_addr = 0;
+ addr = ftrace_get_addr_curr(rec);
+ new_addr = ftrace_get_addr_new(rec);
+ update = ftrace_update_record(rec, enable);
switch (update) {
case FTRACE_UPDATE_IGNORE:
default:
continue;
case FTRACE_UPDATE_MODIFY_CALL:
- addr = ftrace_get_addr_curr(rec);
- new_addr = ftrace_get_addr_new(rec);
break;
case FTRACE_UPDATE_MAKE_CALL:
- addr = ftrace_get_addr_new(rec);
- break;
+ addr = new_addr;
+ fallthrough;
case FTRACE_UPDATE_MAKE_NOP:
- addr = ftrace_get_addr_curr(rec);
+ new_addr = 0;
break;
}
nop_inst = ppc_inst(PPC_RAW_NOP());
@@ -213,7 +212,6 @@ void ftrace_replace_code(int enable)
ret = ftrace_modify_code(ip, old, new);
if (ret)
goto out;
- ftrace_update_record(rec, enable);
}
out:
next prev parent reply other threads:[~2023-05-20 18:21 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-19 19:26 [RFC PATCH] powerpc/ftrace: Refactoring and support for -fpatchable-function-entry Naveen N Rao
2023-05-20 10:34 ` Christophe Leroy
2023-05-20 16:28 ` Christophe Leroy
2023-05-20 17:48 ` Christophe Leroy
2023-05-20 18:17 ` Naveen N Rao [this message]
2023-05-21 9:14 ` Christophe Leroy
2023-05-23 9:31 ` Naveen N Rao
2023-05-26 5:35 ` Christophe Leroy
2023-06-07 17:05 ` Naveen N Rao
2023-05-20 18:28 ` Naveen N Rao
2023-05-23 13:20 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1684605928.yl2udzpst9.naveen@kernel.org \
--to=naveen@kernel.org \
--cc=christophe.leroy@csgroup.eu \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.