* [tip:locking/futex] [futex] bd54df5ea7: will-it-scale.per_thread_ops 33.9% improvement
@ 2025-05-14 2:33 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-05-14 2:33 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: oe-lkp, lkp, linux-kernel, x86, Peter Zijlstra, linux-mm,
oliver.sang
Hello,
kernel test robot noticed a 33.9% improvement of will-it-scale.per_thread_ops on:
commit: bd54df5ea7cadac520e346d5f0fe5d58e635b6ba ("futex: Allow to resize the private local hash")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git locking/futex
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory
parameters:
nr_task: 100%
mode: thread
test: pthread_mutex5
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250513/202505131609.20984254-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2sp3/pthread_mutex5/will-it-scale
commit:
7c4f75a21f ("futex: Allow automatic allocation of process wide futex hash")
bd54df5ea7 ("futex: Allow to resize the private local hash")
7c4f75a21f636486 bd54df5ea7cadac520e346d5f0f
---------------- ---------------------------
%stddev %change %stddev
\ | \
23570282 -32.6% 15883630 ± 2% cpuidle..usage
1862635 -9.3% 1689404 meminfo.Shmem
2110 +19.0% 2512 ± 3% perf-c2c.DRAM.local
0.16 ± 4% -0.1 0.08 ± 4% mpstat.cpu.all.soft%
0.63 -0.2 0.46 ± 3% mpstat.cpu.all.usr%
1264859 ± 2% -47.5% 664434 ± 62% numa-vmstat.node1.nr_file_pages
38897 ± 10% -47.8% 20323 ± 48% numa-vmstat.node1.nr_mapped
206687 -33.5% 137401 ± 2% vmstat.system.cs
427708 -8.0% 393532 vmstat.system.in
5060133 ± 2% -47.5% 2658326 ± 62% numa-meminfo.node1.FilePages
158778 ± 10% -48.5% 81837 ± 46% numa-meminfo.node1.Mapped
6620342 ± 2% -38.3% 4086741 ± 37% numa-meminfo.node1.MemUsed
9566224 +33.9% 12810946 will-it-scale.256.threads
0.18 -11.1% 0.16 will-it-scale.256.threads_idle
37367 +33.9% 50042 will-it-scale.per_thread_ops
9566224 +33.9% 12810946 will-it-scale.workload
0.00 ± 15% +29.7% 0.00 ± 15% sched_debug.cpu.next_balance.stddev
124704 -33.5% 82964 ± 2% sched_debug.cpu.nr_switches.avg
230832 ± 52% -38.2% 142628 ± 5% sched_debug.cpu.nr_switches.max
98911 ± 4% -33.7% 65543 ± 3% sched_debug.cpu.nr_switches.min
17307 ± 60% -47.4% 9105 ± 20% sched_debug.cpu.nr_switches.stddev
672002 -6.5% 628169 proc-vmstat.nr_active_anon
1345624 -3.2% 1302363 proc-vmstat.nr_file_pages
41725 ± 7% -16.3% 34939 ± 12% proc-vmstat.nr_mapped
465688 -9.3% 422425 proc-vmstat.nr_shmem
672002 -6.5% 628169 proc-vmstat.nr_zone_active_anon
1956811 -2.5% 1908264 proc-vmstat.numa_hit
1692181 -2.8% 1644262 proc-vmstat.numa_local
0.20 +4.3% 0.21 perf-stat.i.MPKI
0.05 -0.0 0.05 perf-stat.i.branch-miss-rate%
9101814 -10.3% 8161953 perf-stat.i.branch-misses
14404131 +3.7% 14939924 perf-stat.i.cache-misses
207911 -33.5% 138184 ± 2% perf-stat.i.context-switches
65204 -4.0% 62625 perf-stat.i.cycles-between-cache-misses
0.01 -95.2% 0.00 ±223% perf-stat.i.metric.K/sec
0.20 +4.2% 0.21 perf-stat.overall.MPKI
0.05 -0.0 0.05 perf-stat.overall.branch-miss-rate%
63438 -3.5% 61223 perf-stat.overall.cycles-between-cache-misses
2250086 -25.7% 1671327 perf-stat.overall.path-length
9086343 -10.4% 8139691 perf-stat.ps.branch-misses
14400345 +3.6% 14922252 perf-stat.ps.cache-misses
207422 -33.5% 137839 ± 2% perf-stat.ps.context-switches
0.16 +99.2% 0.32 ± 95% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1.66 ± 12% +17.5% 1.95 ± 3% perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.08 ± 8% +37.8% 0.12 ± 20% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.01 ± 12% +47.5% 0.01 ± 5% perf-sched.sch_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
0.09 ±166% +1763.7% 1.74 ± 65% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.09 +16.3% 0.11 ± 3% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.98 ± 14% +28.2% 3.83 ± 4% perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.18 ± 5% +248.1% 0.61 ± 63% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.15 ±186% +1714.0% 2.76 ± 49% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.01 ± 12% +45.3% 0.02 ± 5% perf-sched.total_sch_delay.average.ms
2.91 ± 2% +61.4% 4.69 ± 4% perf-sched.total_wait_and_delay.average.ms
556081 ± 2% -37.0% 350186 ± 2% perf-sched.total_wait_and_delay.count.ms
2.89 ± 2% +61.5% 4.67 ± 4% perf-sched.total_wait_time.average.ms
0.01 ± 6% +35.6% 0.02 ± 3% perf-sched.wait_and_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
18.90 ± 3% -15.5% 15.98 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
541651 ± 2% -37.0% 341352 ± 2% perf-sched.wait_and_delay.count.futex_do_wait.__futex_wait.futex_wait.do_futex
11.50 ± 18% -84.1% 1.83 ±223% perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
253.67 ± 3% +17.1% 297.00 perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.09 ±166% +1763.7% 1.74 ± 65% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
18.79 ± 3% -15.6% 15.85 perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.15 ±186% +1714.0% 2.76 ± 49% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
43.55 -1.5 42.06 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
43.54 -1.5 42.04 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_wait_setup.__futex_wait.futex_wait
43.83 -1.3 42.54 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
43.83 -1.3 42.54 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
43.76 -1.3 42.48 perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
99.06 +0.2 99.25 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
99.05 +0.2 99.24 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.03 +0.2 99.22 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.02 +0.2 99.22 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
54.99 +1.1 56.14 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_wake.do_futex.__x64_sys_futex
55.02 +1.2 56.21 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
55.19 +1.5 56.68 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
43.83 -1.3 42.54 perf-profile.children.cycles-pp.__futex_wait
43.83 -1.3 42.54 perf-profile.children.cycles-pp.futex_wait
43.76 -1.3 42.48 perf-profile.children.cycles-pp.futex_wait_setup
98.55 -0.3 98.21 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
98.59 -0.3 98.28 perf-profile.children.cycles-pp._raw_spin_lock
0.37 -0.1 0.26 perf-profile.children.cycles-pp.pthread_mutex_lock
0.60 ± 3% -0.1 0.49 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.58 ± 3% -0.1 0.47 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.20 ± 5% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.handle_softirqs
0.18 ± 5% -0.1 0.09 ± 6% perf-profile.children.cycles-pp.sched_balance_domains
0.21 ± 4% -0.1 0.12 ± 4% perf-profile.children.cycles-pp.__irq_exit_rcu
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.common_startup_64
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.cpu_startup_entry
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.do_idle
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.start_secondary
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_idle_do_entry
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_idle_enter
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_safe_halt
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.pv_native_safe_halt
0.11 ± 4% -0.0 0.08 perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.10 -0.0 0.07 ± 5% perf-profile.children.cycles-pp.__schedule
0.11 -0.0 0.08 ± 4% perf-profile.children.cycles-pp.cpuidle_enter
0.06 ± 7% -0.0 0.03 ± 70% perf-profile.children.cycles-pp.futex_do_wait
0.11 ± 3% -0.0 0.08 ± 4% perf-profile.children.cycles-pp.cpuidle_enter_state
0.11 -0.0 0.08 perf-profile.children.cycles-pp.cpuidle_idle_call
0.08 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.sysvec_call_function_single
0.00 +0.1 0.05 perf-profile.children.cycles-pp.futex_q_unlock
0.07 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.futex_q_lock
0.00 +0.2 0.17 perf-profile.children.cycles-pp.futex_hash_put
99.22 +0.2 99.40 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
99.22 +0.2 99.40 perf-profile.children.cycles-pp.do_syscall_64
99.03 +0.2 99.22 perf-profile.children.cycles-pp.__x64_sys_futex
99.02 +0.2 99.22 perf-profile.children.cycles-pp.do_futex
0.00 +0.3 0.33 perf-profile.children.cycles-pp.futex_hash
55.19 +1.5 56.68 perf-profile.children.cycles-pp.futex_wake
97.95 -0.2 97.71 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.37 -0.1 0.26 perf-profile.self.cycles-pp.pthread_mutex_lock
0.18 ± 4% -0.1 0.09 ± 6% perf-profile.self.cycles-pp.sched_balance_domains
0.08 -0.0 0.06 perf-profile.self.cycles-pp.futex_wait_setup
0.07 +0.0 0.12 perf-profile.self.cycles-pp.futex_q_lock
0.00 +0.1 0.05 perf-profile.self.cycles-pp.futex_q_unlock
0.00 +0.1 0.08 perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.2 0.17 perf-profile.self.cycles-pp.futex_hash_put
0.00 +0.3 0.33 perf-profile.self.cycles-pp.futex_hash
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-05-14 2:34 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-14 2:33 [tip:locking/futex] [futex] bd54df5ea7: will-it-scale.per_thread_ops 33.9% improvement kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.