* Perf supporting function reordering?
@ 2014-03-17 16:22 William Cohen
2014-03-17 16:35 ` Andi Kleen
0 siblings, 1 reply; 4+ messages in thread
From: William Cohen @ 2014-03-17 16:22 UTC (permalink / raw)
To: linux-perf-users@vger.kernel.org
Has there been any thought about perf supporting function reordering? The
kernel had a function reorder option that was available in Linux
2.6.17 to 2.6.21? The kernel code has poor performance when compared
to the user-space code. For a simple experiment compiling the kernel
code the kernel was getting a L1 icache miss every 60 instructions
versus the userspace getting an L1 icache miss every 260 instructions.
The difference in IPC was also significant. The kernel code had 0.55
IPC while userspace had 1.59 IPC.
The arguments for removal of the function reorder code for Linux 2.6.22 were:
-linker was slowed too much by many sections
-manual generation of the ordering list got out of date
-too diverse worksloads
With perf it is easy to collect information about the operation
runtime characteristics. Over the weekend I was able to collect call
graph information of kernel build on the system with perf, then
render the data with gprof2dot and dot:
export training=training/make_a_g_branch_k
sudo perf record -a -g -e branches:k -o $training.data su wcohen -c "make -j4"
sudo chown wcohen $training.data; sudo chgrp wcohen $training.data
perf script -i $training.data| gprof2dot --format=perf > $training.gv
dot -Tsvg < $training.gv > $(training).svg
By default gprof2dot prunes nodes and edges. The following provides a
more complete graph:
perf script -i $training.data| gprof2dot --format=perf -n 0.05 -e 0.01 > training/make_a_g_branch_k_2.gv
The results graphs are at:
http://people.redhat.com/wcohen/sediment/make_a_g_branch_k.svg
http://people.redhat.com/wcohen/sediment/make_a_g_branch_k_2.svg
The graphs give some indication of the flow through the kernel
code. Search for "system_call" will show the kernel entry and where
thing branch out from there. Other place of interest "page_fault" and
"apic_timer_interrupt".
Any thoughts on making it easier for perf make this statistical
callgraph information available and using it to do code reordering? I
have experimented with code reorder with user space postgres package
and it did help performance about 5% improvement in IPC
(http://people.redhat.com/wcohen/sediment/html/pop.html)
-Will
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Perf supporting function reordering?
2014-03-17 16:22 Perf supporting function reordering? William Cohen
@ 2014-03-17 16:35 ` Andi Kleen
2014-03-17 18:36 ` William Cohen
0 siblings, 1 reply; 4+ messages in thread
From: Andi Kleen @ 2014-03-17 16:35 UTC (permalink / raw)
To: William Cohen; +Cc: linux-perf-users@vger.kernel.org
William Cohen <wcohen@redhat.com> writes:
> Has there been any thought about perf supporting function reordering?
See autofdo http://gcc.gnu.org/wiki/AutoFDO and
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html
It's not in any standard compiler unfortunately.
Standard gcc can do it with profile feedback, but not for the standard kernel.
> Any thoughts on making it easier for perf make this statistical
> callgraph information available and using it to do code reordering? I
> have experimented with code reorder with user space postgres package
> and it did help performance about 5% improvement in IPC
Is the mechanism of the IPC improvement understood?
My understanding from older tools that did this the main advantage of
pure reordering (not full profile feedback, which has many advantages)
is mainly in startup time improvements and lowering the TLB overhead
slightly, apart from slightly smaller working.
However this all does not apply to the kernel, which does not do demand
paging. In general modern CPUs are pretty good at prefetching code.
For the TLB issues the better strategy is likely just going for
large pages, as Kirill's MM work enables.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Perf supporting function reordering?
2014-03-17 16:35 ` Andi Kleen
@ 2014-03-17 18:36 ` William Cohen
2014-03-17 23:39 ` Andi Kleen
0 siblings, 1 reply; 4+ messages in thread
From: William Cohen @ 2014-03-17 18:36 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-perf-users@vger.kernel.org
On 03/17/2014 12:35 PM, Andi Kleen wrote:
> William Cohen <wcohen@redhat.com> writes:
>
>> Has there been any thought about perf supporting function reordering?
>
> See autofdo http://gcc.gnu.org/wiki/AutoFDO and
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html
>
> It's not in any standard compiler unfortunately.
>
> Standard gcc can do it with profile feedback, but not for the standard kernel.
You mean GCC's "-freorder-functions"? That is rather coarse. According to the link below it only groups functions into hot and cold sections.
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
>
>> Any thoughts on making it easier for perf make this statistical
>> callgraph information available and using it to do code reordering? I
>> have experimented with code reorder with user space postgres package
>> and it did help performance about 5% improvement in IPC
>
> Is the mechanism of the IPC improvement understood?
Most of the IPC performance improvement in the experiment could be explained by the reduction in the iTLB misses.
>
> My understanding from older tools that did this the main advantage of
> pure reordering (not full profile feedback, which has many advantages)
> is mainly in startup time improvements and lowering the TLB overhead
> slightly, apart from slightly smaller working.
Yes, improved startup time, fewer itlb updates, and smaller working set are the expected results of function reordering.
>
> However this all does not apply to the kernel, which does not do demand
> paging. In general modern CPUs are pretty good at prefetching code.
>
> For the TLB issues the better strategy is likely just going for
> large pages, as Kirill's MM work enables.
For the kernel code demand paging and iTLB misses are less of an issue. Is modules code loaded into hugepages or do they use normal sized pages? If the modules are using normal sized pages, then wouldn't some of the large modules (for example kvm, i915 and nouveau) benefit from function reordering?
-Will
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Perf supporting function reordering?
2014-03-17 18:36 ` William Cohen
@ 2014-03-17 23:39 ` Andi Kleen
0 siblings, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2014-03-17 23:39 UTC (permalink / raw)
To: William Cohen; +Cc: Andi Kleen, linux-perf-users@vger.kernel.org
On Mon, Mar 17, 2014 at 02:36:10PM -0400, William Cohen wrote:
> On 03/17/2014 12:35 PM, Andi Kleen wrote:
> > William Cohen <wcohen@redhat.com> writes:
> >
> >> Has there been any thought about perf supporting function reordering?
> >
> > See autofdo http://gcc.gnu.org/wiki/AutoFDO and
> > http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html
> >
> > It's not in any standard compiler unfortunately.
> >
> > Standard gcc can do it with profile feedback, but not for the standard kernel.
>
> You mean GCC's "-freorder-functions"? That is rather coarse. According to the link below it only groups functions into hot and cold sections.
No the IPA passes group the whole program by the global callgraph
(either per unit or globally with LTO)
It also has special support for grouping C++ constructors.
>
> For the kernel code demand paging and iTLB misses are less of an issue. Is modules code loaded into hugepages or do they use normal sized pages? If the modules are using normal sized pages, then wouldn't some of the large modules (for example kvm, i915 and nouveau) benefit from function reordering?
Today the modules use small pages.
Some 2.4 kernels put them actually into the direct (2MB) mapping.
That could be done again.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-03-17 23:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-17 16:22 Perf supporting function reordering? William Cohen
2014-03-17 16:35 ` Andi Kleen
2014-03-17 18:36 ` William Cohen
2014-03-17 23:39 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).