* [rcu] c0f4dfd4f9: -53% perf-stat.cpu-migrations
@ 2014-01-24 12:33 Fengguang Wu
2014-01-27 16:59 ` Paul E. McKenney
0 siblings, 1 reply; 2+ messages in thread
From: Fengguang Wu @ 2014-01-24 12:33 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: LKML
Hi Paul,
Just FYI, we noticed -53% perf-stat.cpu-migrations in dd write tests
on btrfs, which looks good. First good commit is
commit c0f4dfd4f90f1667d234d21f15153ea09a2eaa66
Author: Paul E. McKenney <paul.mckenney@linaro.org>
AuthorDate: Fri Dec 28 11:30:36 2012 -0800
Commit: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CommitDate: Tue Mar 26 08:04:51 2013 -0700
rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
Because RCU callbacks are now associated with the number of the grace
period that they must wait for, CPUs can now take advance callbacks
corresponding to grace periods that ended while a given CPU was in
dyntick-idle mode. This eliminates the need to try forcing the RCU
state machine while entering idle, thus reducing the CPU intensiveness
of RCU_FAST_NO_HZ, which should increase its energy efficiency.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Documentation/kernel-parameters.txt | 28 ++-
include/linux/rcupdate.h | 1 +
init/Kconfig | 17 +-
kernel/rcutree.c | 28 +--
kernel/rcutree.h | 12 +-
kernel/rcutree_plugin.h | 374 ++++++++++--------------------------
kernel/rcutree_trace.c | 2 -
7 files changed, 149 insertions(+), 313 deletions(-)
b11cc5760a9c48c c0f4dfd4f90f1667d234d21f1
--------------- -------------------------
86878 ~138% -90.3% 8397 ~152% cpuidle.POLL.time
154 ~16% -87.3% 19 ~55% cpuidle.POLL.usage
12177976 ~ 4% -85.6% 1748244 ~20% cpuidle.C1-NHM.time
381439 ~ 3% -68.4% 120538 ~ 2% softirqs.RCU
0.53 ~87% +161.8% 1.40 ~16% perf-profile.cpu-cycles.copy_user_generic_string.__btrfs_buffered_write.btrfs_file_aio_write.do_sync_write.vfs_write
5227241 ~ 4% -58.3% 2180928 ~ 7% cpuidle.C1E-NHM.time
0.67 ~88% +88.6% 1.26 ~21% perf-profile.cpu-cycles.calc_csum_metadata_size.btrfs_delalloc_release_metadata.btrfs_clear_bit_hook.clear_state_bit.clear_extent_bit
231531 ~ 2% -48.3% 119653 ~ 2% interrupts.LOC
91019 ~ 2% -40.2% 54404 ~ 2% cpuidle.C3-NHM.usage
1.991e+08 ~ 3% -36.7% 1.26e+08 ~ 7% cpuidle.C3-NHM.time
7.07 ~ 4% -32.7% 4.76 ~ 8% turbostat.%c3
23380 ~33% +41.2% 33024 ~ 6% proc-vmstat.kswapd_low_wmark_hit_quickly
62805 ~ 3% -28.4% 44960 ~ 2% softirqs.SCHED
64678 ~ 1% -30.1% 45195 ~ 1% softirqs.TIMER
55051 ~ 3% -22.2% 42823 ~ 2% interrupts.0:IO-APIC-edge.timer
920 ~ 4% +19.2% 1097 ~ 5% slabinfo.kmalloc-512.active_objs
920 ~ 4% +19.2% 1097 ~ 5% slabinfo.kmalloc-512.num_objs
361987 ~ 2% +9.9% 397730 ~ 0% cpuidle.C6-NHM.usage
5.30 ~ 1% -9.7% 4.78 ~ 1% turbostat.%c1
178105 ~ 3% -53.5% 82837 ~ 1% perf-stat.cpu-migrations
5763 ~ 8% -44.7% 3186 ~22% vmstat.system.cs
3566268 ~ 8% -44.8% 1968744 ~21% perf-stat.context-switches
658 ~ 2% -30.4% 458 ~ 0% vmstat.system.in
53376814 ~12% -24.2% 40482438 ~22% perf-stat.node-load-misses
2.996e+10 ~ 3% -10.8% 2.672e+10 ~ 3% perf-stat.L1-icache-load-misses
1.998e+09 ~ 4% -11.6% 1.766e+09 ~ 2% perf-stat.branch-misses
1.005e+12 ~ 5% -11.9% 8.852e+11 ~ 6% perf-stat.stalled-cycles-frontend
6.344e+08 ~ 2% -6.8% 5.915e+08 ~ 2% perf-stat.LLC-store-misses
2.892e+10 ~ 2% +5.4% 3.047e+10 ~ 3% perf-stat.bus-cycles
perf-stat.cpu-migrations
90000 ++*----*----------*-*---*------*-----------------------------------+
* ** *.*.**.* * *.*.* **.*.*.*.* .*. |
80000 ++ *.* * |
| |
70000 ++ |
| |
60000 ++ |
| |
50000 ++ |
| |
40000 ++ |
| O O |
30000 O+ O O O OO O O O O OO O O O OO O O O OO O O O O OO O O O OO O O
| O |
20000 ++-----------------------------------------------------------------+
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [rcu] c0f4dfd4f9: -53% perf-stat.cpu-migrations
2014-01-24 12:33 [rcu] c0f4dfd4f9: -53% perf-stat.cpu-migrations Fengguang Wu
@ 2014-01-27 16:59 ` Paul E. McKenney
0 siblings, 0 replies; 2+ messages in thread
From: Paul E. McKenney @ 2014-01-27 16:59 UTC (permalink / raw)
To: Fengguang Wu; +Cc: LKML
On Fri, Jan 24, 2014 at 08:33:20PM +0800, Fengguang Wu wrote:
> Hi Paul,
>
> Just FYI, we noticed -53% perf-stat.cpu-migrations in dd write tests
> on btrfs, which looks good. First good commit is
Nice! ;-)
Thanx, Paul
> commit c0f4dfd4f90f1667d234d21f15153ea09a2eaa66
> Author: Paul E. McKenney <paul.mckenney@linaro.org>
> AuthorDate: Fri Dec 28 11:30:36 2012 -0800
> Commit: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CommitDate: Tue Mar 26 08:04:51 2013 -0700
>
> rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
>
> Because RCU callbacks are now associated with the number of the grace
> period that they must wait for, CPUs can now take advance callbacks
> corresponding to grace periods that ended while a given CPU was in
> dyntick-idle mode. This eliminates the need to try forcing the RCU
> state machine while entering idle, thus reducing the CPU intensiveness
> of RCU_FAST_NO_HZ, which should increase its energy efficiency.
>
> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> Documentation/kernel-parameters.txt | 28 ++-
> include/linux/rcupdate.h | 1 +
> init/Kconfig | 17 +-
> kernel/rcutree.c | 28 +--
> kernel/rcutree.h | 12 +-
> kernel/rcutree_plugin.h | 374 ++++++++++--------------------------
> kernel/rcutree_trace.c | 2 -
> 7 files changed, 149 insertions(+), 313 deletions(-)
>
> b11cc5760a9c48c c0f4dfd4f90f1667d234d21f1
> --------------- -------------------------
> 86878 ~138% -90.3% 8397 ~152% cpuidle.POLL.time
> 154 ~16% -87.3% 19 ~55% cpuidle.POLL.usage
> 12177976 ~ 4% -85.6% 1748244 ~20% cpuidle.C1-NHM.time
> 381439 ~ 3% -68.4% 120538 ~ 2% softirqs.RCU
> 0.53 ~87% +161.8% 1.40 ~16% perf-profile.cpu-cycles.copy_user_generic_string.__btrfs_buffered_write.btrfs_file_aio_write.do_sync_write.vfs_write
> 5227241 ~ 4% -58.3% 2180928 ~ 7% cpuidle.C1E-NHM.time
> 0.67 ~88% +88.6% 1.26 ~21% perf-profile.cpu-cycles.calc_csum_metadata_size.btrfs_delalloc_release_metadata.btrfs_clear_bit_hook.clear_state_bit.clear_extent_bit
> 231531 ~ 2% -48.3% 119653 ~ 2% interrupts.LOC
> 91019 ~ 2% -40.2% 54404 ~ 2% cpuidle.C3-NHM.usage
> 1.991e+08 ~ 3% -36.7% 1.26e+08 ~ 7% cpuidle.C3-NHM.time
> 7.07 ~ 4% -32.7% 4.76 ~ 8% turbostat.%c3
> 23380 ~33% +41.2% 33024 ~ 6% proc-vmstat.kswapd_low_wmark_hit_quickly
> 62805 ~ 3% -28.4% 44960 ~ 2% softirqs.SCHED
> 64678 ~ 1% -30.1% 45195 ~ 1% softirqs.TIMER
> 55051 ~ 3% -22.2% 42823 ~ 2% interrupts.0:IO-APIC-edge.timer
> 920 ~ 4% +19.2% 1097 ~ 5% slabinfo.kmalloc-512.active_objs
> 920 ~ 4% +19.2% 1097 ~ 5% slabinfo.kmalloc-512.num_objs
> 361987 ~ 2% +9.9% 397730 ~ 0% cpuidle.C6-NHM.usage
> 5.30 ~ 1% -9.7% 4.78 ~ 1% turbostat.%c1
> 178105 ~ 3% -53.5% 82837 ~ 1% perf-stat.cpu-migrations
> 5763 ~ 8% -44.7% 3186 ~22% vmstat.system.cs
> 3566268 ~ 8% -44.8% 1968744 ~21% perf-stat.context-switches
> 658 ~ 2% -30.4% 458 ~ 0% vmstat.system.in
> 53376814 ~12% -24.2% 40482438 ~22% perf-stat.node-load-misses
> 2.996e+10 ~ 3% -10.8% 2.672e+10 ~ 3% perf-stat.L1-icache-load-misses
> 1.998e+09 ~ 4% -11.6% 1.766e+09 ~ 2% perf-stat.branch-misses
> 1.005e+12 ~ 5% -11.9% 8.852e+11 ~ 6% perf-stat.stalled-cycles-frontend
> 6.344e+08 ~ 2% -6.8% 5.915e+08 ~ 2% perf-stat.LLC-store-misses
> 2.892e+10 ~ 2% +5.4% 3.047e+10 ~ 3% perf-stat.bus-cycles
>
>
> perf-stat.cpu-migrations
>
> 90000 ++*----*----------*-*---*------*-----------------------------------+
> * ** *.*.**.* * *.*.* **.*.*.*.* .*. |
> 80000 ++ *.* * |
> | |
> 70000 ++ |
> | |
> 60000 ++ |
> | |
> 50000 ++ |
> | |
> 40000 ++ |
> | O O |
> 30000 O+ O O O OO O O O O OO O O O OO O O O OO O O O O OO O O O OO O O
> | O |
> 20000 ++-----------------------------------------------------------------+
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-01-27 16:59 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-24 12:33 [rcu] c0f4dfd4f9: -53% perf-stat.cpu-migrations Fengguang Wu
2014-01-27 16:59 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox