From: kernel test robot <oliver.sang@intel.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
<linux-kernel@vger.kernel.org>, <x86@kernel.org>,
Peter Zijlstra <peterz@infradead.org>, <linux-mm@kvack.org>,
<oliver.sang@intel.com>
Subject: [tip:locking/futex] [futex] bd54df5ea7: will-it-scale.per_thread_ops 33.9% improvement
Date: Wed, 14 May 2025 10:33:43 +0800 [thread overview]
Message-ID: <202505131609.20984254-lkp@intel.com> (raw)
Hello,
kernel test robot noticed a 33.9% improvement of will-it-scale.per_thread_ops on:
commit: bd54df5ea7cadac520e346d5f0fe5d58e635b6ba ("futex: Allow to resize the private local hash")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git locking/futex
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory
parameters:
nr_task: 100%
mode: thread
test: pthread_mutex5
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250513/202505131609.20984254-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2sp3/pthread_mutex5/will-it-scale
commit:
7c4f75a21f ("futex: Allow automatic allocation of process wide futex hash")
bd54df5ea7 ("futex: Allow to resize the private local hash")
7c4f75a21f636486 bd54df5ea7cadac520e346d5f0f
---------------- ---------------------------
%stddev %change %stddev
\ | \
23570282 -32.6% 15883630 ± 2% cpuidle..usage
1862635 -9.3% 1689404 meminfo.Shmem
2110 +19.0% 2512 ± 3% perf-c2c.DRAM.local
0.16 ± 4% -0.1 0.08 ± 4% mpstat.cpu.all.soft%
0.63 -0.2 0.46 ± 3% mpstat.cpu.all.usr%
1264859 ± 2% -47.5% 664434 ± 62% numa-vmstat.node1.nr_file_pages
38897 ± 10% -47.8% 20323 ± 48% numa-vmstat.node1.nr_mapped
206687 -33.5% 137401 ± 2% vmstat.system.cs
427708 -8.0% 393532 vmstat.system.in
5060133 ± 2% -47.5% 2658326 ± 62% numa-meminfo.node1.FilePages
158778 ± 10% -48.5% 81837 ± 46% numa-meminfo.node1.Mapped
6620342 ± 2% -38.3% 4086741 ± 37% numa-meminfo.node1.MemUsed
9566224 +33.9% 12810946 will-it-scale.256.threads
0.18 -11.1% 0.16 will-it-scale.256.threads_idle
37367 +33.9% 50042 will-it-scale.per_thread_ops
9566224 +33.9% 12810946 will-it-scale.workload
0.00 ± 15% +29.7% 0.00 ± 15% sched_debug.cpu.next_balance.stddev
124704 -33.5% 82964 ± 2% sched_debug.cpu.nr_switches.avg
230832 ± 52% -38.2% 142628 ± 5% sched_debug.cpu.nr_switches.max
98911 ± 4% -33.7% 65543 ± 3% sched_debug.cpu.nr_switches.min
17307 ± 60% -47.4% 9105 ± 20% sched_debug.cpu.nr_switches.stddev
672002 -6.5% 628169 proc-vmstat.nr_active_anon
1345624 -3.2% 1302363 proc-vmstat.nr_file_pages
41725 ± 7% -16.3% 34939 ± 12% proc-vmstat.nr_mapped
465688 -9.3% 422425 proc-vmstat.nr_shmem
672002 -6.5% 628169 proc-vmstat.nr_zone_active_anon
1956811 -2.5% 1908264 proc-vmstat.numa_hit
1692181 -2.8% 1644262 proc-vmstat.numa_local
0.20 +4.3% 0.21 perf-stat.i.MPKI
0.05 -0.0 0.05 perf-stat.i.branch-miss-rate%
9101814 -10.3% 8161953 perf-stat.i.branch-misses
14404131 +3.7% 14939924 perf-stat.i.cache-misses
207911 -33.5% 138184 ± 2% perf-stat.i.context-switches
65204 -4.0% 62625 perf-stat.i.cycles-between-cache-misses
0.01 -95.2% 0.00 ±223% perf-stat.i.metric.K/sec
0.20 +4.2% 0.21 perf-stat.overall.MPKI
0.05 -0.0 0.05 perf-stat.overall.branch-miss-rate%
63438 -3.5% 61223 perf-stat.overall.cycles-between-cache-misses
2250086 -25.7% 1671327 perf-stat.overall.path-length
9086343 -10.4% 8139691 perf-stat.ps.branch-misses
14400345 +3.6% 14922252 perf-stat.ps.cache-misses
207422 -33.5% 137839 ± 2% perf-stat.ps.context-switches
0.16 +99.2% 0.32 ± 95% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1.66 ± 12% +17.5% 1.95 ± 3% perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.08 ± 8% +37.8% 0.12 ± 20% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.01 ± 12% +47.5% 0.01 ± 5% perf-sched.sch_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
0.09 ±166% +1763.7% 1.74 ± 65% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.09 +16.3% 0.11 ± 3% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.98 ± 14% +28.2% 3.83 ± 4% perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.18 ± 5% +248.1% 0.61 ± 63% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.15 ±186% +1714.0% 2.76 ± 49% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.01 ± 12% +45.3% 0.02 ± 5% perf-sched.total_sch_delay.average.ms
2.91 ± 2% +61.4% 4.69 ± 4% perf-sched.total_wait_and_delay.average.ms
556081 ± 2% -37.0% 350186 ± 2% perf-sched.total_wait_and_delay.count.ms
2.89 ± 2% +61.5% 4.67 ± 4% perf-sched.total_wait_time.average.ms
0.01 ± 6% +35.6% 0.02 ± 3% perf-sched.wait_and_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
18.90 ± 3% -15.5% 15.98 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
541651 ± 2% -37.0% 341352 ± 2% perf-sched.wait_and_delay.count.futex_do_wait.__futex_wait.futex_wait.do_futex
11.50 ± 18% -84.1% 1.83 ±223% perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
253.67 ± 3% +17.1% 297.00 perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.09 ±166% +1763.7% 1.74 ± 65% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
18.79 ± 3% -15.6% 15.85 perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.15 ±186% +1714.0% 2.76 ± 49% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
43.55 -1.5 42.06 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
43.54 -1.5 42.04 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_wait_setup.__futex_wait.futex_wait
43.83 -1.3 42.54 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
43.83 -1.3 42.54 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
43.76 -1.3 42.48 perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
99.06 +0.2 99.25 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
99.05 +0.2 99.24 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.03 +0.2 99.22 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.02 +0.2 99.22 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
54.99 +1.1 56.14 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_wake.do_futex.__x64_sys_futex
55.02 +1.2 56.21 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
55.19 +1.5 56.68 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
43.83 -1.3 42.54 perf-profile.children.cycles-pp.__futex_wait
43.83 -1.3 42.54 perf-profile.children.cycles-pp.futex_wait
43.76 -1.3 42.48 perf-profile.children.cycles-pp.futex_wait_setup
98.55 -0.3 98.21 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
98.59 -0.3 98.28 perf-profile.children.cycles-pp._raw_spin_lock
0.37 -0.1 0.26 perf-profile.children.cycles-pp.pthread_mutex_lock
0.60 ± 3% -0.1 0.49 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.58 ± 3% -0.1 0.47 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.20 ± 5% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.handle_softirqs
0.18 ± 5% -0.1 0.09 ± 6% perf-profile.children.cycles-pp.sched_balance_domains
0.21 ± 4% -0.1 0.12 ± 4% perf-profile.children.cycles-pp.__irq_exit_rcu
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.common_startup_64
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.cpu_startup_entry
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.do_idle
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.start_secondary
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_idle_do_entry
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_idle_enter
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_safe_halt
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.pv_native_safe_halt
0.11 ± 4% -0.0 0.08 perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.10 -0.0 0.07 ± 5% perf-profile.children.cycles-pp.__schedule
0.11 -0.0 0.08 ± 4% perf-profile.children.cycles-pp.cpuidle_enter
0.06 ± 7% -0.0 0.03 ± 70% perf-profile.children.cycles-pp.futex_do_wait
0.11 ± 3% -0.0 0.08 ± 4% perf-profile.children.cycles-pp.cpuidle_enter_state
0.11 -0.0 0.08 perf-profile.children.cycles-pp.cpuidle_idle_call
0.08 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.sysvec_call_function_single
0.00 +0.1 0.05 perf-profile.children.cycles-pp.futex_q_unlock
0.07 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.futex_q_lock
0.00 +0.2 0.17 perf-profile.children.cycles-pp.futex_hash_put
99.22 +0.2 99.40 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
99.22 +0.2 99.40 perf-profile.children.cycles-pp.do_syscall_64
99.03 +0.2 99.22 perf-profile.children.cycles-pp.__x64_sys_futex
99.02 +0.2 99.22 perf-profile.children.cycles-pp.do_futex
0.00 +0.3 0.33 perf-profile.children.cycles-pp.futex_hash
55.19 +1.5 56.68 perf-profile.children.cycles-pp.futex_wake
97.95 -0.2 97.71 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.37 -0.1 0.26 perf-profile.self.cycles-pp.pthread_mutex_lock
0.18 ± 4% -0.1 0.09 ± 6% perf-profile.self.cycles-pp.sched_balance_domains
0.08 -0.0 0.06 perf-profile.self.cycles-pp.futex_wait_setup
0.07 +0.0 0.12 perf-profile.self.cycles-pp.futex_q_lock
0.00 +0.1 0.05 perf-profile.self.cycles-pp.futex_q_unlock
0.00 +0.1 0.08 perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.2 0.17 perf-profile.self.cycles-pp.futex_hash_put
0.00 +0.3 0.33 perf-profile.self.cycles-pp.futex_hash
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
reply other threads:[~2025-05-14 2:34 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202505131609.20984254-lkp@intel.com \
--to=oliver.sang@intel.com \
--cc=bigeasy@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.