All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>, <oliver.sang@intel.com>
Subject: [bigeasy-staging:futex_local_v6] [futex]  99b9906f6c: will-it-scale.per_thread_ops 97.8% regression
Date: Tue, 31 Dec 2024 16:47:57 +0800	[thread overview]
Message-ID: <202412311453.4232da5f-lkp@intel.com> (raw)



Hello,

kernel test robot noticed a 97.8% regression of will-it-scale.per_thread_ops on:


commit: 99b9906f6cb6c689ccccef3b8e0a7a5af7f80960 ("futex: Allow automatic allocation of process wide futex hash.")
https://git.kernel.org/cgit/linux/kernel/git/bigeasy/staging.git futex_local_v6

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
parameters:

	nr_task: 100%
	mode: thread
	test: futex4
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202412311453.4232da5f-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241231/202412311453.4232da5f-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/futex4/will-it-scale

commit: 
  1cb2a2d0d7 ("futex: Add basic infrastructure for local task local hash.")
  99b9906f6c ("futex: Allow automatic allocation of process wide futex hash.")

1cb2a2d0d7bad8ea 99b9906f6cb6c689ccccef3b8e0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    137104 ±  4%     +32.0%     181042 ±  5%  meminfo.Mapped
   1181264           +11.2%    1314118        meminfo.Shmem
     72.61           +34.8%      97.86        vmstat.cpu.sy
     25.61           -96.1%       0.99        vmstat.cpu.us
 1.156e+09           -97.8%   25626505        will-it-scale.224.threads
   5159474           -97.8%     114403        will-it-scale.per_thread_ops
 1.156e+09           -97.8%   25626505        will-it-scale.workload
      0.42 ±  4%      -0.1        0.34        mpstat.cpu.all.irq%
      0.01 ±  6%      -0.0        0.00 ± 10%  mpstat.cpu.all.soft%
     72.43           +25.9       98.35        mpstat.cpu.all.sys%
     25.90           -25.2        0.67        mpstat.cpu.all.usr%
     14.50 ± 21%   +1918.4%     292.67 ± 10%  perf-c2c.DRAM.local
    712.67 ± 49%  +25244.0%     180618 ±  3%  perf-c2c.DRAM.remote
    575.17 ± 27%  +41178.0%     237417 ±  2%  perf-c2c.HITM.local
    677.50 ± 51%  +25438.2%     173021 ±  3%  perf-c2c.HITM.remote
      1252 ± 28%  +32665.2%     410438 ±  3%  perf-c2c.HITM.total
    229056 ± 66%    +168.1%     614175 ± 21%  numa-meminfo.node0.Active
    229056 ± 66%    +168.1%     614175 ± 21%  numa-meminfo.node0.Active(anon)
   1731085 ±  8%     -14.4%    1481532 ±  9%  numa-meminfo.node1.Active
   1731085 ±  8%     -14.4%    1481532 ±  9%  numa-meminfo.node1.Active(anon)
    169873 ± 56%     -54.8%      76863 ±135%  numa-meminfo.node1.AnonHugePages
     69383 ± 52%     +82.5%     126612 ± 27%  numa-meminfo.node1.Mapped
     57260 ± 66%    +168.2%     153546 ± 21%  numa-vmstat.node0.nr_active_anon
     57259 ± 66%    +168.2%     153546 ± 21%  numa-vmstat.node0.nr_zone_active_anon
    432835 ±  8%     -14.4%     370456 ±  9%  numa-vmstat.node1.nr_active_anon
     82.98 ± 56%     -54.7%      37.56 ±135%  numa-vmstat.node1.nr_anon_transparent_hugepages
     17168 ± 53%     +85.9%      31921 ± 27%  numa-vmstat.node1.nr_mapped
    432834 ±  8%     -14.4%     370456 ±  9%  numa-vmstat.node1.nr_zone_active_anon
  63231066 ± 14%     -23.1%   48613348 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.max
  63231066 ± 14%     -23.1%   48613348 ±  3%  sched_debug.cfs_rq:/.min_vruntime.max
   2363398 ± 76%     -48.5%    1217003 ± 12%  sched_debug.cpu.avg_idle.max
    152445 ± 64%     -40.5%      90724 ± 13%  sched_debug.cpu.avg_idle.stddev
    680860 ± 33%     -21.7%     532874 ±  3%  sched_debug.cpu.max_idle_balance_cost.max
     13728 ±103%     -74.6%       3490 ± 69%  sched_debug.cpu.max_idle_balance_cost.stddev
    490095            +6.9%     524105        proc-vmstat.nr_active_anon
   1171706            +2.8%    1204774        proc-vmstat.nr_file_pages
     34114 ±  4%     +32.9%      45347 ±  5%  proc-vmstat.nr_mapped
    295322           +11.2%     328395        proc-vmstat.nr_shmem
    490095            +6.9%     524105        proc-vmstat.nr_zone_active_anon
   1714481            +2.8%    1761991        proc-vmstat.numa_hit
   1482692            +3.2%    1530183        proc-vmstat.numa_local
     38309 ± 18%     -44.4%      21282 ± 29%  proc-vmstat.numa_pages_migrated
     38309 ± 18%     -44.4%      21282 ± 29%  proc-vmstat.pgmigrate_success
      0.00 ± 20%     +95.5%       0.01 ± 23%  perf-sched.sch_delay.avg.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.09 ± 58%     -96.7%       0.00 ±180%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.00           +54.2%       0.01 ± 17%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.00           +66.7%       0.01 ±  7%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ±  7%   +6803.4%       0.67 ± 69%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.00 ± 27%    +139.1%       0.01 ± 26%  perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
      5.59 ± 30%     -39.8%       3.36 ± 42%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      1.81 ± 75%     -99.7%       0.00 ±187%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.01 ±  7%    +151.6%       0.01 ± 15%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      2.93 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     12.18 ±  7%     -46.2%       6.55 ±  4%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    613.33 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    391.67 ±  5%     +95.9%     767.17 ±  4%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    863.81          +110.2%       1815 ± 37%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
     11.17 ± 30%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    454.33 ±  6%     -28.3%     325.83 ± 10%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.09 ± 58%     -96.7%       0.00 ±180%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      2.86 ± 10%     -30.5%       1.99        perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     12.17 ±  7%     -46.2%       6.55 ±  4%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    863.81          +110.2%       1815 ± 37%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      5.59 ± 30%     -39.8%       3.36 ± 42%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      1.81 ± 75%     -99.7%       0.00 ±187%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      5.00           -53.4%       2.33 ± 32%  perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
    454.32 ±  6%     -28.3%     325.82 ± 10%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.02 ±108%   +5957.2%       1.40        perf-stat.i.MPKI
 1.092e+11           -87.4%  1.373e+10        perf-stat.i.branch-instructions
      0.03 ± 72%      +0.2        0.26        perf-stat.i.branch-miss-rate%
   8332453          +314.7%   34555257        perf-stat.i.branch-misses
   2840046 ± 77%   +2954.6%   86752119        perf-stat.i.cache-misses
   9562648 ± 32%   +1852.1%  1.867e+08        perf-stat.i.cache-references
      0.89         +1072.3%      10.39        perf-stat.i.cpi
    294.53            -4.9%     280.02        perf-stat.i.cpu-migrations
    560955 ± 57%     -98.7%       7456        perf-stat.i.cycles-between-cache-misses
 7.327e+11           -91.5%  6.224e+10        perf-stat.i.instructions
      1.13           -91.3%       0.10        perf-stat.i.ipc
      0.00 ± 77%  +35611.7%       1.39        perf-stat.overall.MPKI
      0.01 ±  2%      +0.2        0.25        perf-stat.overall.branch-miss-rate%
      0.88         +1083.5%      10.40        perf-stat.overall.cpi
    367727 ± 50%     -98.0%       7462        perf-stat.overall.cycles-between-cache-misses
      1.14           -91.6%       0.10        perf-stat.overall.ipc
    192321          +280.7%     732210        perf-stat.overall.path-length
 1.088e+11           -87.4%  1.369e+10        perf-stat.ps.branch-instructions
   8267617          +316.3%   34415093        perf-stat.ps.branch-misses
   2845099 ± 77%   +2939.7%   86483187        perf-stat.ps.cache-misses
   9576433 ± 32%   +1844.0%  1.862e+08        perf-stat.ps.cache-references
    292.67            -4.8%     278.67        perf-stat.ps.cpu-migrations
 7.301e+11           -91.5%  6.204e+10        perf-stat.ps.instructions
 2.223e+14           -91.6%  1.876e+13        perf-stat.total.instructions
     30.26           -30.3        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
      9.36            -9.4        0.00        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      4.89            -2.7        2.21 ±  4%  perf-profile.calltrace.cycles-pp.futex_q_unlock.futex_wait_setup.__futex_wait.futex_wait.do_futex
     91.58            +7.8       99.41        perf-profile.calltrace.cycles-pp.syscall
     66.02           +33.0       98.98        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
     63.05           +35.9       98.92        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     46.61           +52.0       98.58        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     43.11           +55.4       98.51        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     40.51           +57.9       98.45        perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     36.03           +62.3       98.37        perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
     30.30           +67.9       98.25        perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
     10.94           +84.8       95.72        perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
      3.54           +91.2       94.70        perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
      0.00           +94.5       94.51        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait
     16.73           -16.4        0.37        perf-profile.children.cycles-pp.entry_SYSCALL_64
      9.92            -9.7        0.22        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      7.78            -7.6        0.19        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      3.81            -3.7        0.08        perf-profile.children.cycles-pp.futex_hash
      5.09            -2.9        2.21 ±  4%  perf-profile.children.cycles-pp.futex_q_unlock
      2.80            -2.7        0.07        perf-profile.children.cycles-pp.get_futex_key
      2.64            -2.6        0.05        perf-profile.children.cycles-pp.x64_sys_call
      2.38            -2.3        0.06        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.36 ±  4%      -0.1        0.24 ± 11%  perf-profile.children.cycles-pp.__cmd_record
      0.36 ±  4%      -0.1        0.24 ± 11%  perf-profile.children.cycles-pp.cmd_record
      0.36 ±  4%      -0.1        0.24 ± 11%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.36 ±  4%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.handle_internal_command
      0.36 ±  4%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.main
      0.36 ±  4%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.run_builtin
      0.34 ±  5%      -0.1        0.23 ± 11%  perf-profile.children.cycles-pp.perf_mmap__push
      0.22 ±  4%      -0.1        0.12 ±  7%  perf-profile.children.cycles-pp.record__pushfn
      0.22 ±  5%      -0.1        0.13 ± 10%  perf-profile.children.cycles-pp.writen
      0.22 ±  4%      -0.1        0.13 ± 10%  perf-profile.children.cycles-pp.write
      0.47 ±  6%      -0.1        0.39 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.19 ±  5%      -0.1        0.11 ± 11%  perf-profile.children.cycles-pp.ksys_write
      0.44 ±  6%      -0.1        0.36 ±  3%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.42 ±  6%      -0.1        0.34 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.43 ±  6%      -0.1        0.35 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.18 ±  5%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.vfs_write
      0.17 ±  6%      -0.1        0.10 ± 11%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.16 ±  6%      -0.1        0.09 ±  9%  perf-profile.children.cycles-pp.generic_perform_write
      0.11 ± 11%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.ktime_get
      0.11 ± 14%      -0.0        0.08 ±  7%  perf-profile.children.cycles-pp.clockevents_program_event
      0.24 ±  5%      -0.0        0.22 ±  3%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.25 ±  4%      -0.0        0.22 ±  3%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.07 ±  5%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.ring_buffer_read_head
      0.09 ±  4%      -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.get_jiffies_update
      0.09 ±  4%      -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.tmigr_requires_handle_remote
      0.08 ±  6%      -0.0        0.06 ± 13%  perf-profile.children.cycles-pp.perf_mmap__read_head
      0.21 ±  3%      -0.0        0.19 ±  3%  perf-profile.children.cycles-pp.update_process_times
     98.27            +1.4       99.72        perf-profile.children.cycles-pp.syscall
     66.44           +32.7       99.11        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     64.06           +35.0       99.06        perf-profile.children.cycles-pp.do_syscall_64
     47.21           +51.4       98.59        perf-profile.children.cycles-pp.__x64_sys_futex
     43.97           +54.6       98.53        perf-profile.children.cycles-pp.do_futex
     40.72           +57.7       98.46        perf-profile.children.cycles-pp.futex_wait
     36.50           +61.9       98.38        perf-profile.children.cycles-pp.__futex_wait
     31.35           +66.9       98.27        perf-profile.children.cycles-pp.futex_wait_setup
     11.59           +84.1       95.73        perf-profile.children.cycles-pp.futex_q_lock
      3.73           +91.0       94.73        perf-profile.children.cycles-pp._raw_spin_lock
      0.03 ±147%     +94.5       94.54        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     17.41           -17.0        0.38        perf-profile.self.cycles-pp.syscall
     12.41           -12.1        0.28        perf-profile.self.cycles-pp.futex_wait_setup
      8.53            -8.3        0.19        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      7.57            -7.4        0.18 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      5.08            -5.0        0.10 ±  4%  perf-profile.self.cycles-pp.__futex_wait
      4.61            -4.5        0.10        perf-profile.self.cycles-pp.do_syscall_64
      3.60            -3.5        0.08        perf-profile.self.cycles-pp.futex_hash
      3.54            -3.5        0.07        perf-profile.self.cycles-pp.do_futex
      3.55            -3.3        0.20 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock
      3.31            -3.2        0.07        perf-profile.self.cycles-pp.futex_wait
      3.24            -3.2        0.06        perf-profile.self.cycles-pp.__x64_sys_futex
      3.19            -3.1        0.07        perf-profile.self.cycles-pp.entry_SYSCALL_64
      3.97 ±  3%      -3.1        0.92 ±  2%  perf-profile.self.cycles-pp.futex_q_lock
      4.82            -2.6        2.20 ±  4%  perf-profile.self.cycles-pp.futex_q_unlock
      2.56            -2.5        0.06        perf-profile.self.cycles-pp.get_futex_key
      2.55            -2.5        0.05 ±  7%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      2.38            -2.3        0.06        perf-profile.self.cycles-pp.syscall_return_via_sysret
      2.34            -2.3        0.02 ± 99%  perf-profile.self.cycles-pp.x64_sys_call
      0.06 ±  7%      -0.0        0.03 ±100%  perf-profile.self.cycles-pp.ring_buffer_read_head
      0.10 ± 15%      -0.0        0.07 ±  8%  perf-profile.self.cycles-pp.ktime_get
      0.09 ±  4%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.get_jiffies_update
      0.03 ±147%     +94.1       94.18        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


                 reply	other threads:[~2024-12-31  8:48 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202412311453.4232da5f-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=bigeasy@linutronix.de \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.