[tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [tip:sched/core] [sched]  2ae891b826:  hackbench.throughput 6.2% regression
@ 2025-02-25  2:32 kernel test robot
  2025-02-25  9:31 ` Chen Yu
  0 siblings, 1 reply; 9+ messages in thread
From: kernel test robot @ 2025-02-25  2:32 UTC (permalink / raw)
  To: zihan zhou
  Cc: oe-lkp, lkp, linux-kernel, x86, Peter Zijlstra, Vincent Guittot,
	aubrey.li, yu.c.chen, oliver.sang



Hello,

kernel test robot noticed a 6.2% regression of hackbench.throughput on:


commit: 2ae891b826958b60919ea21c727f77bcd6ffcc2c ("sched: Reduce the default slice to avoid tasks getting an extra tick")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core

[test failed on linux-next/master d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa]

testcase: hackbench
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	iterations: 4
	mode: process
	ipc: socket
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.membarrier.ops_per_sec  10.5% regression                             |
| test machine     | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
| test parameters  | cpufreq_governor=performance                                                              |
|                  | nr_threads=100%                                                                           |
|                  | test=membarrier                                                                           |
|                  | testtime=60s                                                                              |
+------------------+-------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202502251026.bb927780-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250225/202502251026.bb927780-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
  gcc-12/performance/socket/4/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench

commit: 
  f553741ac8 ("sched: Cancel the slice protection of the idle entity")
  2ae891b826 ("sched: Reduce the default slice to avoid tasks getting an extra tick")

f553741ac8c0e467 2ae891b826958b60919ea21c727 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      5457 ±  6%     +30.9%       7146 ± 11%  perf-c2c.DRAM.remote
      1156 ± 17%     +76.3%       2038 ± 19%  perf-c2c.HITM.remote
    790654 ±  2%     +22.8%     971104        sched_debug.cpu.nr_switches.avg
    659209 ±  2%     +24.6%     821703 ±  3%  sched_debug.cpu.nr_switches.min
   1706905           +20.0%    2047861        vmstat.system.cs
    296017            +5.8%     313318 ±  2%  vmstat.system.in
     15076 ± 48%    +121.3%      33360 ± 35%  proc-vmstat.numa_pages_migrated
   3389933 ±  5%     +15.3%    3907919 ±  3%  proc-vmstat.pgalloc_normal
   2565152 ±  6%     +27.9%    3280218 ±  5%  proc-vmstat.pgfree
     15076 ± 48%    +121.3%      33360 ± 35%  proc-vmstat.pgmigrate_success
    781.28 ± 57%    -100.0%       0.08 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
      3394 ± 51%    -100.0%       0.08 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
      0.18 ± 74%   +3280.0%       6.22 ±125%  perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     42.40 ± 41%     -62.7%      15.83 ± 60%  perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     86.80 ± 42%     -89.4%       9.17 ± 97%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_anon_vmas
    977.49 ± 51%     -99.9%       0.95 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
      3397 ± 50%    -100.0%       0.95 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
    433157            -6.2%     406447        hackbench.throughput
    423258            -6.9%     394005        hackbench.throughput_avg
    433157            -6.2%     406447        hackbench.throughput_best
    411374            -6.8%     383238        hackbench.throughput_worst
    143.13            +7.3%     153.65        hackbench.time.elapsed_time
    143.13            +7.3%     153.65        hackbench.time.elapsed_time.max
  39754543 ±  3%     +56.8%   62349308        hackbench.time.involuntary_context_switches
    623881            +3.9%     648284        hackbench.time.minor_page_faults
     17045            +7.7%      18350        hackbench.time.system_time
    900.50            +2.5%     922.71        hackbench.time.user_time
 2.019e+08           +23.3%  2.489e+08        hackbench.time.voluntary_context_switches
      1.61            -2.3%       1.57        perf-stat.i.MPKI
 4.411e+10            -5.0%  4.192e+10        perf-stat.i.branch-instructions
      0.41 ±  2%      +0.0        0.44        perf-stat.i.branch-miss-rate%
 1.744e+08            +1.6%  1.772e+08        perf-stat.i.branch-misses
     25.15            -0.6       24.50        perf-stat.i.cache-miss-rate%
   3.5e+08            -7.0%  3.255e+08        perf-stat.i.cache-misses
 1.398e+09            -3.8%  1.346e+09        perf-stat.i.cache-references
   1677956 ±  2%     +20.8%    2027400        perf-stat.i.context-switches
      1.49            +5.6%       1.57        perf-stat.i.cpi
     46084 ±  8%     +44.6%      66621 ±  8%  perf-stat.i.cpu-migrations
    935.91            +8.3%       1013        perf-stat.i.cycles-between-cache-misses
 2.175e+11            -5.1%  2.065e+11        perf-stat.i.instructions
      0.68            -5.2%       0.64        perf-stat.i.ipc
     13.38 ±  2%     +21.7%      16.28        perf-stat.i.metric.K/sec
      1.61            -2.0%       1.58        perf-stat.overall.MPKI
      0.39            +0.0        0.42        perf-stat.overall.branch-miss-rate%
     25.05            -0.8       24.23        perf-stat.overall.cache-miss-rate%
      1.49            +5.5%       1.57        perf-stat.overall.cpi
    926.46            +7.6%     996.92        perf-stat.overall.cycles-between-cache-misses
      0.67            -5.2%       0.64        perf-stat.overall.ipc
 4.382e+10            -5.0%  4.164e+10        perf-stat.ps.branch-instructions
  1.73e+08            +1.5%  1.755e+08        perf-stat.ps.branch-misses
 3.475e+08            -7.0%  3.233e+08        perf-stat.ps.cache-misses
 1.387e+09            -3.8%  1.334e+09        perf-stat.ps.cache-references
   1662988 ±  2%     +20.6%    2004942        perf-stat.ps.context-switches
     44600 ±  8%     +43.7%      64072 ±  7%  perf-stat.ps.cpu-migrations
 2.161e+11            -5.1%  2.051e+11        perf-stat.ps.instructions
 3.105e+13            +2.0%  3.169e+13        perf-stat.total.instructions
      8.54 ±  2%      -1.0        7.54        perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      8.46 ±  2%      -1.0        7.47        perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      8.30 ±  2%      -1.0        7.31        perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
      4.38 ±  2%      -0.6        3.81 ±  2%  perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
      3.20 ±  3%      -0.3        2.85        perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
      3.00 ±  3%      -0.3        2.67        perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
      3.40 ±  3%      -0.3        3.10 ±  3%  perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      2.30 ±  3%      -0.3        2.00        perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
      3.25 ±  3%      -0.3        2.97 ±  3%  perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
      3.07 ±  2%      -0.3        2.79 ±  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      3.05 ±  3%      -0.3        2.79 ±  3%  perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic
      2.50 ±  3%      -0.2        2.29 ±  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
      2.18 ±  3%      -0.2        1.99        perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      1.99 ±  3%      -0.2        1.82        perf-profile.calltrace.cycles-pp.clear_bhb_loop.write
      1.95 ±  4%      -0.2        1.78 ±  2%  perf-profile.calltrace.cycles-pp.clear_bhb_loop.read
      2.68 ±  3%      -0.1        2.54 ±  2%  perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
      1.55 ±  3%      -0.1        1.42        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
      1.55 ±  3%      -0.1        1.42        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      1.35 ±  3%      -0.1        1.24 ±  3%  perf-profile.calltrace.cycles-pp.__slab_free.kfree.skb_release_data.consume_skb.unix_stream_read_generic
      1.04 ±  3%      -0.1        0.96 ±  3%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      1.12 ±  3%      -0.1        1.04        perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
      0.62 ±  4%      -0.1        0.56        perf-profile.calltrace.cycles-pp.__build_skb_around.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
      0.72 ±  3%      -0.1        0.66        perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
      0.63 ±  2%      -0.1        0.57 ±  3%  perf-profile.calltrace.cycles-pp.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      0.57 ±  3%      -0.0        0.52 ±  2%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter
      1.17 ±  3%      +0.2        1.32 ±  6%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.42 ± 50%      +0.3        0.76 ± 22%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
      1.36 ±  3%      +0.5        1.88 ± 21%  perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic
      1.38 ±  3%      +0.5        1.91 ± 21%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
      1.43 ±  3%      +0.5        1.98 ± 21%  perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      1.63 ±  3%      +0.7        2.28 ± 21%  perf-profile.calltrace.cycles-pp.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
     36.49            +0.8       37.34        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     35.51            +0.9       36.43        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     38.59            +0.9       39.52        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     38.32            +1.0       39.27        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     34.44            +1.0       35.42        perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     33.04            +1.1       34.12        perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
      8.58 ±  2%      -1.0        7.58        perf-profile.children.cycles-pp.unix_stream_read_actor
      8.35 ±  2%      -1.0        7.36        perf-profile.children.cycles-pp.__skb_datagram_iter
      8.50 ±  2%      -1.0        7.51        perf-profile.children.cycles-pp.skb_copy_datagram_iter
      4.40 ±  2%      -0.6        3.83 ±  2%  perf-profile.children.cycles-pp._copy_to_iter
      5.77 ±  2%      -0.4        5.32 ±  3%  perf-profile.children.cycles-pp.__memcg_slab_free_hook
      4.41 ±  3%      -0.4        3.98        perf-profile.children.cycles-pp.__check_object_size
      4.80 ±  3%      -0.4        4.40        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      3.24 ±  3%      -0.4        2.89        perf-profile.children.cycles-pp.simple_copy_to_iter
      2.98 ±  3%      -0.3        2.64        perf-profile.children.cycles-pp.check_heap_object
      3.98 ±  3%      -0.3        3.64        perf-profile.children.cycles-pp.clear_bhb_loop
      3.44 ±  2%      -0.3        3.14 ±  3%  perf-profile.children.cycles-pp.skb_release_head_state
      3.31 ±  2%      -0.3        3.03 ±  3%  perf-profile.children.cycles-pp.unix_destruct_scm
      3.09 ±  3%      -0.3        2.82 ±  3%  perf-profile.children.cycles-pp.sock_wfree
      2.42 ±  3%      -0.2        2.23 ±  3%  perf-profile.children.cycles-pp.__slab_free
      2.59 ±  2%      -0.2        2.42 ±  2%  perf-profile.children.cycles-pp.mod_objcg_state
      1.78 ±  3%      -0.2        1.62        perf-profile.children.cycles-pp.entry_SYSCALL_64
      2.76 ±  3%      -0.1        2.61 ±  2%  perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
      1.38 ±  3%      -0.1        1.25        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.30 ±  4%      -0.1        1.19        perf-profile.children.cycles-pp.obj_cgroup_charge
      0.65 ±  4%      -0.1        0.57        perf-profile.children.cycles-pp.__build_skb_around
      0.66 ±  3%      -0.1        0.61        perf-profile.children.cycles-pp.refill_obj_stock
      0.73 ±  3%      -0.1        0.68        perf-profile.children.cycles-pp.__check_heap_object
      0.59 ±  3%      -0.1        0.54 ±  2%  perf-profile.children.cycles-pp.rw_verify_area
      0.66 ±  2%      -0.1        0.61 ±  3%  perf-profile.children.cycles-pp.skb_unlink
      0.55 ±  4%      -0.0        0.51 ±  2%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.28 ±  3%      -0.0        0.26        perf-profile.children.cycles-pp.__scm_recv_common
      0.16 ±  4%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.is_vmalloc_addr
      0.16 ±  3%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.security_socket_recvmsg
      0.17 ±  2%      -0.0        0.16        perf-profile.children.cycles-pp.put_pid
      0.14 ±  3%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.manage_oob
      0.11            -0.0        0.10        perf-profile.children.cycles-pp.wait_for_unix_gc
      0.06 ±  6%      +0.0        0.08 ± 11%  perf-profile.children.cycles-pp.os_xsave
      0.20 ±  3%      +0.0        0.23 ±  7%  perf-profile.children.cycles-pp.__get_user_8
      0.06 ±  6%      +0.0        0.09 ± 17%  perf-profile.children.cycles-pp.sched_clock
      0.06 ±  6%      +0.0        0.09 ± 14%  perf-profile.children.cycles-pp.check_preempt_wakeup_fair
      0.09 ±  5%      +0.0        0.13 ± 18%  perf-profile.children.cycles-pp.__switch_to
      0.08 ±  4%      +0.0        0.12 ± 21%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.15 ±  6%      +0.0        0.20 ± 10%  perf-profile.children.cycles-pp.__dequeue_entity
      0.25 ±  3%      +0.0        0.29 ±  9%  perf-profile.children.cycles-pp.rseq_ip_fixup
      0.09 ± 10%      +0.0        0.14 ± 15%  perf-profile.children.cycles-pp.pick_eevdf
      0.13 ±  7%      +0.0        0.18 ± 14%  perf-profile.children.cycles-pp.__enqueue_entity
      0.08 ± 10%      +0.0        0.12 ± 27%  perf-profile.children.cycles-pp.wakeup_preempt
      0.01 ±200%      +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.vruntime_eligible
      0.01 ±200%      +0.1        0.07 ± 23%  perf-profile.children.cycles-pp.___perf_sw_event
      0.01 ±200%      +0.1        0.08 ± 27%  perf-profile.children.cycles-pp.put_prev_entity
      0.31 ±  2%      +0.1        0.38 ± 12%  perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      0.22 ±  7%      +0.1        0.30 ± 12%  perf-profile.children.cycles-pp.set_next_entity
      0.14 ±  5%      +0.1        0.22 ± 22%  perf-profile.children.cycles-pp.pick_task_fair
      0.14 ± 44%      +0.1        0.24 ± 15%  perf-profile.children.cycles-pp.get_any_partial
      0.27 ±  5%      +0.1        0.37 ± 15%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      0.33 ±  4%      +0.1        0.47 ± 22%  perf-profile.children.cycles-pp.enqueue_entity
      0.30 ±  4%      +0.2        0.46 ± 26%  perf-profile.children.cycles-pp.update_load_avg
      0.48 ±  4%      +0.2        0.72 ± 26%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.51 ±  3%      +0.2        0.75 ± 27%  perf-profile.children.cycles-pp.enqueue_task
      0.48 ±  6%      +0.3        0.75 ± 23%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.49 ±  6%      +0.3        0.76 ± 24%  perf-profile.children.cycles-pp.__pick_next_task
      0.60 ±  4%      +0.3        0.89 ± 25%  perf-profile.children.cycles-pp.ttwu_do_activate
      1.67 ±  2%      +0.6        2.23 ± 20%  perf-profile.children.cycles-pp.schedule_timeout
      1.64 ±  3%      +0.7        2.29 ± 21%  perf-profile.children.cycles-pp.unix_stream_data_wait
      1.78 ±  4%      +0.7        2.53 ± 21%  perf-profile.children.cycles-pp.schedule
      1.78 ±  4%      +0.8        2.54 ± 22%  perf-profile.children.cycles-pp.__schedule
     36.58            +0.8       37.42        perf-profile.children.cycles-pp.ksys_write
     35.60            +0.9       36.51        perf-profile.children.cycles-pp.vfs_write
     34.52            +1.0       35.49        perf-profile.children.cycles-pp.sock_write_iter
     33.31            +1.0       34.36        perf-profile.children.cycles-pp.unix_stream_sendmsg
      4.37 ±  2%      -0.6        3.79 ±  2%  perf-profile.self.cycles-pp._copy_to_iter
      3.94 ±  3%      -0.3        3.60        perf-profile.self.cycles-pp.clear_bhb_loop
      2.27 ±  3%      -0.3        1.98        perf-profile.self.cycles-pp.check_heap_object
      3.29 ±  2%      -0.3        3.01 ±  5%  perf-profile.self.cycles-pp.__memcg_slab_free_hook
      2.03 ±  4%      -0.3        1.76 ±  2%  perf-profile.self.cycles-pp.kmem_cache_free
      2.50 ±  3%      -0.2        2.25 ±  3%  perf-profile.self.cycles-pp.sock_wfree
      2.61 ±  2%      -0.2        2.37 ±  3%  perf-profile.self.cycles-pp.unix_stream_read_generic
      2.30 ±  3%      -0.2        2.09        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
      2.37 ±  3%      -0.2        2.18 ±  3%  perf-profile.self.cycles-pp.__slab_free
      1.04 ±  4%      -0.2        0.86 ±  5%  perf-profile.self.cycles-pp.skb_release_data
      2.19 ±  4%      -0.2        2.01 ±  2%  perf-profile.self.cycles-pp.mod_objcg_state
      1.31 ±  3%      -0.1        1.18        perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
      1.33 ±  3%      -0.1        1.21        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.04 ±  3%      -0.1        0.93        perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
      1.13 ±  3%      -0.1        1.02        perf-profile.self.cycles-pp.__alloc_skb
      1.38 ±  2%      -0.1        1.29 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.74 ±  3%      -0.1        0.66        perf-profile.self.cycles-pp.__skb_datagram_iter
      1.11 ±  3%      -0.1        1.03 ±  2%  perf-profile.self.cycles-pp.sock_write_iter
      0.80 ±  3%      -0.1        0.74 ±  2%  perf-profile.self.cycles-pp.write
      0.60 ±  4%      -0.1        0.54        perf-profile.self.cycles-pp.__build_skb_around
      0.84 ±  4%      -0.1        0.78        perf-profile.self.cycles-pp.sock_read_iter
      0.69 ±  3%      -0.1        0.64        perf-profile.self.cycles-pp.__check_heap_object
      0.62 ±  3%      -0.1        0.57        perf-profile.self.cycles-pp.refill_obj_stock
      0.82            -0.0        0.77        perf-profile.self.cycles-pp.read
      0.80 ±  3%      -0.0        0.75 ±  3%  perf-profile.self.cycles-pp.do_syscall_64
      0.51 ±  4%      -0.0        0.47        perf-profile.self.cycles-pp.__virt_addr_valid
      0.46 ±  2%      -0.0        0.43 ±  2%  perf-profile.self.cycles-pp.kfree
      0.59 ±  3%      -0.0        0.56 ±  2%  perf-profile.self.cycles-pp.__check_object_size
      0.44 ±  2%      -0.0        0.41        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.36 ±  3%      -0.0        0.32 ±  2%  perf-profile.self.cycles-pp.rw_verify_area
      0.43 ±  2%      -0.0        0.40 ±  2%  perf-profile.self.cycles-pp.unix_write_space
      0.37 ±  4%      -0.0        0.34 ±  2%  perf-profile.self.cycles-pp.x64_sys_call
      0.34 ±  3%      -0.0        0.31 ±  2%  perf-profile.self.cycles-pp.__cond_resched
      0.29 ±  3%      -0.0        0.27        perf-profile.self.cycles-pp.ksys_write
      0.30 ±  2%      -0.0        0.28 ±  2%  perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
      0.18 ±  4%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.18 ±  2%      -0.0        0.17 ±  2%  perf-profile.self.cycles-pp.unix_destruct_scm
      0.21 ±  3%      -0.0        0.19        perf-profile.self.cycles-pp.__scm_recv_common
      0.25            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.kmalloc_reserve
      0.15 ±  2%      -0.0        0.14        perf-profile.self.cycles-pp.skb_unlink
      0.15 ±  2%      -0.0        0.14        perf-profile.self.cycles-pp.unix_scm_to_skb
      0.07 ±  9%      +0.0        0.10 ± 19%  perf-profile.self.cycles-pp.pick_eevdf
      0.09 ±  5%      +0.0        0.13 ± 16%  perf-profile.self.cycles-pp.__switch_to
      0.11 ±  7%      +0.0        0.14 ± 10%  perf-profile.self.cycles-pp.__dequeue_entity
      0.08 ±  5%      +0.0        0.12 ± 22%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.02 ±122%      +0.0        0.06 ± 17%  perf-profile.self.cycles-pp.native_sched_clock
      0.13 ±  8%      +0.1        0.18 ± 14%  perf-profile.self.cycles-pp.__enqueue_entity
      0.00            +0.1        0.06 ±  9%  perf-profile.self.cycles-pp.vruntime_eligible
      0.27 ±  5%      +0.1        0.37 ± 15%  perf-profile.self.cycles-pp.switch_mm_irqs_off


***************************************************************************************************
lkp-icl-2sp7: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/membarrier/stress-ng/60s

commit: 
  f553741ac8 ("sched: Cancel the slice protection of the idle entity")
  2ae891b826 ("sched: Reduce the default slice to avoid tasks getting an extra tick")

f553741ac8c0e467 2ae891b826958b60919ea21c727 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1.08            -0.1        0.99        mpstat.cpu.all.irq%
     67.18 ±  2%     -11.9%      59.20 ±  5%  mpstat.max_utilization_pct
      3401 ± 19%     -31.4%       2332 ± 18%  perf-c2c.DRAM.remote
      2396 ±  3%     -23.1%       1844 ± 18%  perf-c2c.HITM.remote
     29248           +14.3%      33418        vmstat.system.cs
    788485            -9.1%     716631        vmstat.system.in
    191106            -1.7%     187946        proc-vmstat.nr_anon_pages
    535277 ±  2%      +5.6%     565009 ±  4%  proc-vmstat.numa_hit
    469052 ±  2%      +6.3%     498763 ±  5%  proc-vmstat.numa_local
     51285 ±  7%     +54.3%      79119 ± 31%  proc-vmstat.numa_pages_migrated
     51285 ±  7%     +54.3%      79119 ± 31%  proc-vmstat.pgmigrate_success
     16417 ±  7%    +131.4%      37986 ± 78%  proc-vmstat.pgreuse
    505.28           -10.6%     451.92        stress-ng.membarrier.membarrier_calls_per_sec
     97160           -10.5%      86939        stress-ng.membarrier.ops
      1618           -10.5%       1448        stress-ng.membarrier.ops_per_sec
     55094 ±  5%    +277.5%     207976 ±  9%  stress-ng.time.involuntary_context_switches
      3195 ±  2%      -8.3%       2931        stress-ng.time.percent_of_cpu_this_job_got
      1921 ±  2%      -8.3%       1761        stress-ng.time.system_time
   1047923            +5.9%    1109900        stress-ng.time.voluntary_context_switches
 5.501e+09 ±  2%      -7.8%  5.074e+09        perf-stat.i.branch-instructions
     30090           +14.4%      34431        perf-stat.i.context-switches
 1.041e+11 ±  2%      -7.6%  9.627e+10        perf-stat.i.cpu-cycles
     10683            +6.7%      11402        perf-stat.i.cpu-migrations
  2.73e+10 ±  2%      -7.6%  2.522e+10        perf-stat.i.instructions
 5.406e+09 ±  2%      -7.8%  4.985e+09        perf-stat.ps.branch-instructions
     29571           +14.4%      33836        perf-stat.ps.context-switches
 1.024e+11 ±  2%      -7.6%   9.46e+10        perf-stat.ps.cpu-cycles
     10498            +6.7%      11203        perf-stat.ps.cpu-migrations
 2.683e+10 ±  2%      -7.6%  2.478e+10        perf-stat.ps.instructions
 1.631e+12 ±  2%      -7.7%  1.505e+12        perf-stat.total.instructions
    698086 ±  4%     -12.0%     614339 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.avg
    918198 ±  7%     -13.5%     794083 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.max
    650282 ±  4%     -12.9%     566525 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.min
    698086 ±  4%     -12.0%     614339 ±  3%  sched_debug.cfs_rq:/.min_vruntime.avg
    918198 ±  7%     -13.5%     794083 ±  6%  sched_debug.cfs_rq:/.min_vruntime.max
    650282 ±  4%     -12.9%     566525 ±  4%  sched_debug.cfs_rq:/.min_vruntime.min
     13.48 ± 36%    +250.6%      47.25 ± 40%  sched_debug.cfs_rq:/.removed.load_avg.avg
     77.26 ± 17%     +91.9%     148.27 ± 24%  sched_debug.cfs_rq:/.removed.load_avg.stddev
      5.08 ± 33%    +246.5%      17.60 ± 35%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
    212.33 ± 20%     +30.1%     276.17 ±  7%  sched_debug.cfs_rq:/.removed.runnable_avg.max
     30.44 ± 21%     +89.0%      57.52 ± 14%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
      5.08 ± 33%    +246.6%      17.60 ± 35%  sched_debug.cfs_rq:/.removed.util_avg.avg
    212.25 ± 21%     +30.1%     276.08 ±  7%  sched_debug.cfs_rq:/.removed.util_avg.max
     30.43 ± 21%     +89.0%      57.51 ± 14%  sched_debug.cfs_rq:/.removed.util_avg.stddev
     15701           +12.8%      17719        sched_debug.cpu.nr_switches.avg
     11778 ±  7%     +20.3%      14165 ±  8%  sched_debug.cpu.nr_switches.min
   -202.17           +21.0%    -244.58        sched_debug.cpu.nr_uninterruptible.min
      1.43 ± 36%     -99.6%       0.01 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.94 ± 23%     -91.9%       0.08 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      1.60 ± 68%     -99.9%       0.00 ±223%  perf-sched.sch_delay.avg.ms.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
      1.39 ±  8%     +71.7%       2.38 ±  7%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
      1.95 ±  5%     +23.7%       2.41 ±  5%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_private_expedited
      0.89 ±  4%     -16.0%       0.75 ±  3%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
      0.01 ± 25%     +75.0%       0.02 ± 34%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.06 ± 11%     -37.5%       0.04 ± 40%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.80 ±145%    +478.8%       4.62 ± 52%  perf-sched.sch_delay.max.ms.__cond_resched.__mutex_lock.constprop.0.membarrier_private_expedited
      5.29 ± 41%     -99.9%       0.01 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      6.37 ± 13%     -93.7%       0.40 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      2.22 ± 49%     -99.9%       0.00 ±223%  perf-sched.sch_delay.max.ms.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
     10.40 ± 13%     +32.1%      13.74 ±  5%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
      4.55 ±  5%     -34.9%       2.96 ± 42%  perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.98 ±  4%     +33.4%       1.30 ±  6%  perf-sched.total_sch_delay.average.ms
     22.34           -12.3%      19.59        perf-sched.total_wait_and_delay.average.ms
    102076           +18.6%     121096        perf-sched.total_wait_and_delay.count.ms
     21.37           -14.4%      18.29        perf-sched.total_wait_time.average.ms
    515.07 ± 36%     +63.2%     840.46 ± 16%  perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     11.25 ±  5%     +56.4%      17.59 ±  7%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
     15.80           -13.5%      13.67        perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
    487.31 ±  4%     +16.4%     567.38 ±  2%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      8.00 ± 26%     +95.8%      15.67 ± 20%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1384 ± 12%     +58.1%       2188 ±  8%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
     10678 ±  7%    +270.1%      39521 ±  4%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_private_expedited
     85629           -12.4%      75039 ±  3%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
      2443 ± 44%     -58.3%       1018        perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      2099 ± 55%     -76.1%     501.21        perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     15.94 ±  9%     -86.6%       2.13 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
    515.06 ± 36%     +63.2%     840.45 ± 16%  perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     15.25 ±  3%     -85.0%       2.29 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
    427.24 ± 78%     -99.6%       1.55 ±107%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     10.38 ± 53%     -95.2%       0.50 ±223%  perf-sched.wait_time.avg.ms.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
     48.58 ±185%     -94.2%       2.80 ± 99%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      9.86 ±  5%     +54.2%      15.21 ±  7%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
     14.92           -13.4%      12.92        perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
      1.30 ±  8%     -11.1%       1.15 ±  6%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    487.30 ±  4%     +16.4%     567.36 ±  2%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      6.13 ±141%    +268.5%      22.60 ± 17%  perf-sched.wait_time.max.ms.__cond_resched.__mutex_lock.constprop.0.membarrier_private_expedited
     25.13 ±  9%     -91.5%       2.13 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
     25.92 ± 12%     -86.1%       3.61 ±223%  perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      2260 ± 59%     -99.9%       3.00 ±118%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     13.02 ± 43%     -96.2%       0.50 ±223%  perf-sched.wait_time.max.ms.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
      2443 ± 44%     -58.3%       1018        perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      2097 ± 55%     -76.1%     500.54        perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:sched/core] [sched]  2ae891b826:  hackbench.throughput 6.2% regression
  2025-02-25  2:32 [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression kernel test robot
@ 2025-02-25  9:31 ` Chen Yu
  2025-02-25  9:45   ` Vincent Guittot
  2025-02-25 12:27   ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Chen Yu @ 2025-02-25  9:31 UTC (permalink / raw)
  To: zihan zhou
  Cc: oe-lkp, kernel test robot, lkp, linux-kernel, x86, Peter Zijlstra,
	Vincent Guittot, aubrey.li, yu.c.chen

On 2025-02-25 at 10:32:13 +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 6.2% regression of hackbench.throughput on:
> 
> 
> commit: 2ae891b826958b60919ea21c727f77bcd6ffcc2c ("sched: Reduce the default slice to avoid tasks getting an extra tick")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
> 
> [test failed on linux-next/master d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa]
> 
> testcase: hackbench
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	nr_threads: 100%
> 	iterations: 4
> 	mode: process
> 	ipc: socket
> 	cpufreq_governor: performance
> 
> 
>   39754543 ±  3%     +56.8%   62349308        hackbench.time.involuntary_context_switches
>

This patch shrinks the base_slice so the deadline is reached earlier to trigger the
tick preemption IIUC. For the hackbench case, my assumption is that hackbench seems to
encounter more wakeup preemption and hurts throughtput. If more frequent tick preemption
is needed, but more frequent wakeup preemption is not, are we able to do this base_slice
shrink for tick preemption only rather than the wakeup preemption? A wild guess, can we
add smaller base_slice 0.7 in update_deadline() for tick preemption, but remains the old
value 0.75 in update_deadline() for wakeup preemption during enqueue.

But consider that the 6% regression is not that high, and the user might customize
base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
future(we have encountered some SPECjbb regression due to over-preemption).

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
  2025-02-25  9:31 ` Chen Yu
@ 2025-02-25  9:45   ` Vincent Guittot
  2025-02-25 10:15     ` Chen Yu
  2025-02-25 12:27   ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Vincent Guittot @ 2025-02-25  9:45 UTC (permalink / raw)
  To: Chen Yu
  Cc: zihan zhou, oe-lkp, kernel test robot, lkp, linux-kernel, x86,
	Peter Zijlstra, aubrey.li, yu.c.chen

On Tue, 25 Feb 2025 at 10:31, Chen Yu <yu.chen.surf@foxmail.com> wrote:
>
> On 2025-02-25 at 10:32:13 +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 6.2% regression of hackbench.throughput on:
> >
> >
> > commit: 2ae891b826958b60919ea21c727f77bcd6ffcc2c ("sched: Reduce the default slice to avoid tasks getting an extra tick")
> > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
> >
> > [test failed on linux-next/master d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa]
> >
> > testcase: hackbench
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> > parameters:
> >
> >       nr_threads: 100%
> >       iterations: 4
> >       mode: process
> >       ipc: socket
> >       cpufreq_governor: performance
> >
> >
> >   39754543 ą  3%     +56.8%   62349308        hackbench.time.involuntary_context_switches
> >
>
> This patch shrinks the base_slice so the deadline is reached earlier to trigger the
> tick preemption IIUC. For the hackbench case, my assumption is that hackbench seems to

For systems with more than 8 CPUs, the base slice was
0.75*(1+ilog2(8)) = 3ms which is exactly 3 tick periods at 1000hz but
because the tick period is almost never fully accounted to the task,
the task was running 4 tick periods instead of 3. The normalized
base_slice has been reduced from 0.75 to 0.70ms so the base slice
becomes 2.8ms for 8 CPUs and more and the main result is that tasks
will run 3 tick periods instead of 4.

> encounter more wakeup preemption and hurts throughtput. If more frequent tick preemption
> is needed, but more frequent wakeup preemption is not, are we able to do this base_slice
> shrink for tick preemption only rather than the wakeup preemption? A wild guess, can we
> add smaller base_slice 0.7 in update_deadline() for tick preemption, but remains the old
> value 0.75 in update_deadline() for wakeup preemption during enqueue.
>
> But consider that the 6% regression is not that high, and the user might customize
> base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> future(we have encountered some SPECjbb regression due to over-preemption).
>
> thanks,
> Chenyu
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
  2025-02-25  9:45   ` Vincent Guittot
@ 2025-02-25 10:15     ` Chen Yu
  0 siblings, 0 replies; 9+ messages in thread
From: Chen Yu @ 2025-02-25 10:15 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: zihan zhou, oe-lkp, kernel test robot, lkp, linux-kernel, x86,
	Peter Zijlstra, aubrey.li, yu.c.chen, yu.chen.surf

On 2025-02-25 at 10:45:35 +0100, Vincent Guittot wrote:
> On Tue, 25 Feb 2025 at 10:31, Chen Yu <yu.chen.surf@foxmail.com> wrote:
> >
> > On 2025-02-25 at 10:32:13 +0800, kernel test robot wrote:
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a 6.2% regression of hackbench.throughput on:
> > >
> > >
> > > commit: 2ae891b826958b60919ea21c727f77bcd6ffcc2c ("sched: Reduce the default slice to avoid tasks getting an extra tick")
> > > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
> > >
> > > [test failed on linux-next/master d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa]
> > >
> > > testcase: hackbench
> > > config: x86_64-rhel-9.4
> > > compiler: gcc-12
> > > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> > > parameters:
> > >
> > >       nr_threads: 100%
> > >       iterations: 4
> > >       mode: process
> > >       ipc: socket
> > >       cpufreq_governor: performance
> > >
> > >
> > >   39754543 ą  3%     +56.8%   62349308        hackbench.time.involuntary_context_switches
> > >
> >
> > This patch shrinks the base_slice so the deadline is reached earlier to trigger the
> > tick preemption IIUC. For the hackbench case, my assumption is that hackbench seems to
> 
> For systems with more than 8 CPUs, the base slice was
> 0.75*(1+ilog2(8)) = 3ms which is exactly 3 tick periods at 1000hz but
> because the tick period is almost never fully accounted to the task,
> the task was running 4 tick periods instead of 3. The normalized
> base_slice has been reduced from 0.75 to 0.70ms so the base slice
> becomes 2.8ms for 8 CPUs and more and the main result is that tasks
> will run 3 tick periods instead of 4.
>

Thanks for the detailed explanation; I now understand the background.
It is a correct fix for tick preemption and slightly affects wakeup
preemption (smaller deadline in place_entity())

thanks,
Chenyu
 
> > encounter more wakeup preemption and hurts throughtput. If more frequent tick preemption
> > is needed, but more frequent wakeup preemption is not, are we able to do this base_slice
> > shrink for tick preemption only rather than the wakeup preemption? A wild guess, can we
> > add smaller base_slice 0.7 in update_deadline() for tick preemption, but remains the old
> > value 0.75 in update_deadline() for wakeup preemption during enqueue.
> >
> > But consider that the 6% regression is not that high, and the user might customize
> > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > future(we have encountered some SPECjbb regression due to over-preemption).
> >
> > thanks,
> > Chenyu
> >


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:sched/core] [sched]  2ae891b826:  hackbench.throughput 6.2% regression
  2025-02-25  9:31 ` Chen Yu
  2025-02-25  9:45   ` Vincent Guittot
@ 2025-02-25 12:27   ` Peter Zijlstra
  2025-02-25 13:15     ` Chen Yu
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2025-02-25 12:27 UTC (permalink / raw)
  To: Chen Yu
  Cc: zihan zhou, oe-lkp, kernel test robot, lkp, linux-kernel, x86,
	Vincent Guittot, aubrey.li, yu.c.chen

On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:

> 
> But consider that the 6% regression is not that high, and the user might customize
> base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> future(we have encountered some SPECjbb regression due to over-preemption).

You can specify a per-task slice using sched_attr::sched_runtime. Also
see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
set request/slice suggestion")



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:sched/core] [sched]  2ae891b826:  hackbench.throughput 6.2% regression
  2025-02-25 12:27   ` Peter Zijlstra
@ 2025-02-25 13:15     ` Chen Yu
  2025-02-25 13:42       ` Qais Yousef
  0 siblings, 1 reply; 9+ messages in thread
From: Chen Yu @ 2025-02-25 13:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: zihan zhou, oe-lkp, kernel test robot, lkp, linux-kernel, x86,
	Vincent Guittot, aubrey.li, yu.c.chen

On 2025-02-25 at 13:27:05 +0100, Peter Zijlstra wrote:
> On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
> 
> > 
> > But consider that the 6% regression is not that high, and the user might customize
> > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > future(we have encountered some SPECjbb regression due to over-preemption).
> 
> You can specify a per-task slice using sched_attr::sched_runtime. Also
> see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
> set request/slice suggestion")
> 
>

Thanks, we'll have a try during the next test cycle.

thanks,
Chenyu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:sched/core] [sched]  2ae891b826:  hackbench.throughput 6.2% regression
  2025-02-25 13:15     ` Chen Yu
@ 2025-02-25 13:42       ` Qais Yousef
  2025-02-25 15:35         ` Chen Yu
  0 siblings, 1 reply; 9+ messages in thread
From: Qais Yousef @ 2025-02-25 13:42 UTC (permalink / raw)
  To: Chen Yu
  Cc: Peter Zijlstra, zihan zhou, oe-lkp, kernel test robot, lkp,
	linux-kernel, x86, Vincent Guittot, aubrey.li, yu.c.chen

On 02/25/25 21:15, Chen Yu wrote:
> On 2025-02-25 at 13:27:05 +0100, Peter Zijlstra wrote:
> > On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
> > 
> > > 
> > > But consider that the 6% regression is not that high, and the user might customize
> > > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > > future(we have encountered some SPECjbb regression due to over-preemption).
> > 
> > You can specify a per-task slice using sched_attr::sched_runtime. Also
> > see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
> > set request/slice suggestion")
> > 
> >
> 
> Thanks, we'll have a try during the next test cycle.

Could you also try with HRTICK enabled?

	echo HRTICK | sudo tee /sys/kernel/debug/sched/features

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:sched/core] [sched]  2ae891b826:  hackbench.throughput 6.2% regression
  2025-02-25 13:42       ` Qais Yousef
@ 2025-02-25 15:35         ` Chen Yu
  2025-02-25 23:10           ` Qais Yousef
  0 siblings, 1 reply; 9+ messages in thread
From: Chen Yu @ 2025-02-25 15:35 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Peter Zijlstra, zihan zhou, oe-lkp, kernel test robot, lkp,
	linux-kernel, x86, Vincent Guittot, aubrey.li, yu.c.chen

On 2025-02-25 at 13:42:20 +0000, Qais Yousef wrote:
> On 02/25/25 21:15, Chen Yu wrote:
> > On 2025-02-25 at 13:27:05 +0100, Peter Zijlstra wrote:
> > > On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
> > > 
> > > > 
> > > > But consider that the 6% regression is not that high, and the user might customize
> > > > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > > > future(we have encountered some SPECjbb regression due to over-preemption).
> > > 
> > > You can specify a per-task slice using sched_attr::sched_runtime. Also
> > > see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
> > > set request/slice suggestion")
> > > 
> > >
> > 
> > Thanks, we'll have a try during the next test cycle.
> 
> Could you also try with HRTICK enabled?
> 
> 	echo HRTICK | sudo tee /sys/kernel/debug/sched/features

Sure. Is HRTICK supposed to encourage preemption? I thought
hackbench might be sensitive to preemption(downgrading).

thanks,
Chenyu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:sched/core] [sched]  2ae891b826:  hackbench.throughput 6.2% regression
  2025-02-25 15:35         ` Chen Yu
@ 2025-02-25 23:10           ` Qais Yousef
  0 siblings, 0 replies; 9+ messages in thread
From: Qais Yousef @ 2025-02-25 23:10 UTC (permalink / raw)
  To: Chen Yu
  Cc: Peter Zijlstra, zihan zhou, oe-lkp, kernel test robot, lkp,
	linux-kernel, x86, Vincent Guittot, aubrey.li, yu.c.chen

On 02/25/25 23:35, Chen Yu wrote:
> On 2025-02-25 at 13:42:20 +0000, Qais Yousef wrote:
> > On 02/25/25 21:15, Chen Yu wrote:
> > > On 2025-02-25 at 13:27:05 +0100, Peter Zijlstra wrote:
> > > > On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
> > > > 
> > > > > 
> > > > > But consider that the 6% regression is not that high, and the user might customize
> > > > > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > > > > future(we have encountered some SPECjbb regression due to over-preemption).
> > > > 
> > > > You can specify a per-task slice using sched_attr::sched_runtime. Also
> > > > see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
> > > > set request/slice suggestion")
> > > > 
> > > >
> > > 
> > > Thanks, we'll have a try during the next test cycle.
> > 
> > Could you also try with HRTICK enabled?
> > 
> > 	echo HRTICK | sudo tee /sys/kernel/debug/sched/features
> 
> Sure. Is HRTICK supposed to encourage preemption? I thought
> hackbench might be sensitive to preemption(downgrading).

Yes my bad, I don't know why I read this as preemption was happening later not
earlier. Please ignore me then.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-02-25 23:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-25  2:32 [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression kernel test robot
2025-02-25  9:31 ` Chen Yu
2025-02-25  9:45   ` Vincent Guittot
2025-02-25 10:15     ` Chen Yu
2025-02-25 12:27   ` Peter Zijlstra
2025-02-25 13:15     ` Chen Yu
2025-02-25 13:42       ` Qais Yousef
2025-02-25 15:35         ` Chen Yu
2025-02-25 23:10           ` Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox