* [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
@ 2025-02-25 2:32 kernel test robot
2025-02-25 9:31 ` Chen Yu
0 siblings, 1 reply; 9+ messages in thread
From: kernel test robot @ 2025-02-25 2:32 UTC (permalink / raw)
To: zihan zhou
Cc: oe-lkp, lkp, linux-kernel, x86, Peter Zijlstra, Vincent Guittot,
aubrey.li, yu.c.chen, oliver.sang
Hello,
kernel test robot noticed a 6.2% regression of hackbench.throughput on:
commit: 2ae891b826958b60919ea21c727f77bcd6ffcc2c ("sched: Reduce the default slice to avoid tasks getting an extra tick")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
[test failed on linux-next/master d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa]
testcase: hackbench
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
iterations: 4
mode: process
ipc: socket
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.membarrier.ops_per_sec 10.5% regression |
| test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=membarrier |
| | testtime=60s |
+------------------+-------------------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202502251026.bb927780-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250225/202502251026.bb927780-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
gcc-12/performance/socket/4/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
commit:
f553741ac8 ("sched: Cancel the slice protection of the idle entity")
2ae891b826 ("sched: Reduce the default slice to avoid tasks getting an extra tick")
f553741ac8c0e467 2ae891b826958b60919ea21c727
---------------- ---------------------------
%stddev %change %stddev
\ | \
5457 ± 6% +30.9% 7146 ± 11% perf-c2c.DRAM.remote
1156 ± 17% +76.3% 2038 ± 19% perf-c2c.HITM.remote
790654 ± 2% +22.8% 971104 sched_debug.cpu.nr_switches.avg
659209 ± 2% +24.6% 821703 ± 3% sched_debug.cpu.nr_switches.min
1706905 +20.0% 2047861 vmstat.system.cs
296017 +5.8% 313318 ± 2% vmstat.system.in
15076 ± 48% +121.3% 33360 ± 35% proc-vmstat.numa_pages_migrated
3389933 ± 5% +15.3% 3907919 ± 3% proc-vmstat.pgalloc_normal
2565152 ± 6% +27.9% 3280218 ± 5% proc-vmstat.pgfree
15076 ± 48% +121.3% 33360 ± 35% proc-vmstat.pgmigrate_success
781.28 ± 57% -100.0% 0.08 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
3394 ± 51% -100.0% 0.08 ±223% perf-sched.sch_delay.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
0.18 ± 74% +3280.0% 6.22 ±125% perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
42.40 ± 41% -62.7% 15.83 ± 60% perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
86.80 ± 42% -89.4% 9.17 ± 97% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_anon_vmas
977.49 ± 51% -99.9% 0.95 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
3397 ± 50% -100.0% 0.95 ±223% perf-sched.wait_time.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
433157 -6.2% 406447 hackbench.throughput
423258 -6.9% 394005 hackbench.throughput_avg
433157 -6.2% 406447 hackbench.throughput_best
411374 -6.8% 383238 hackbench.throughput_worst
143.13 +7.3% 153.65 hackbench.time.elapsed_time
143.13 +7.3% 153.65 hackbench.time.elapsed_time.max
39754543 ± 3% +56.8% 62349308 hackbench.time.involuntary_context_switches
623881 +3.9% 648284 hackbench.time.minor_page_faults
17045 +7.7% 18350 hackbench.time.system_time
900.50 +2.5% 922.71 hackbench.time.user_time
2.019e+08 +23.3% 2.489e+08 hackbench.time.voluntary_context_switches
1.61 -2.3% 1.57 perf-stat.i.MPKI
4.411e+10 -5.0% 4.192e+10 perf-stat.i.branch-instructions
0.41 ± 2% +0.0 0.44 perf-stat.i.branch-miss-rate%
1.744e+08 +1.6% 1.772e+08 perf-stat.i.branch-misses
25.15 -0.6 24.50 perf-stat.i.cache-miss-rate%
3.5e+08 -7.0% 3.255e+08 perf-stat.i.cache-misses
1.398e+09 -3.8% 1.346e+09 perf-stat.i.cache-references
1677956 ± 2% +20.8% 2027400 perf-stat.i.context-switches
1.49 +5.6% 1.57 perf-stat.i.cpi
46084 ± 8% +44.6% 66621 ± 8% perf-stat.i.cpu-migrations
935.91 +8.3% 1013 perf-stat.i.cycles-between-cache-misses
2.175e+11 -5.1% 2.065e+11 perf-stat.i.instructions
0.68 -5.2% 0.64 perf-stat.i.ipc
13.38 ± 2% +21.7% 16.28 perf-stat.i.metric.K/sec
1.61 -2.0% 1.58 perf-stat.overall.MPKI
0.39 +0.0 0.42 perf-stat.overall.branch-miss-rate%
25.05 -0.8 24.23 perf-stat.overall.cache-miss-rate%
1.49 +5.5% 1.57 perf-stat.overall.cpi
926.46 +7.6% 996.92 perf-stat.overall.cycles-between-cache-misses
0.67 -5.2% 0.64 perf-stat.overall.ipc
4.382e+10 -5.0% 4.164e+10 perf-stat.ps.branch-instructions
1.73e+08 +1.5% 1.755e+08 perf-stat.ps.branch-misses
3.475e+08 -7.0% 3.233e+08 perf-stat.ps.cache-misses
1.387e+09 -3.8% 1.334e+09 perf-stat.ps.cache-references
1662988 ± 2% +20.6% 2004942 perf-stat.ps.context-switches
44600 ± 8% +43.7% 64072 ± 7% perf-stat.ps.cpu-migrations
2.161e+11 -5.1% 2.051e+11 perf-stat.ps.instructions
3.105e+13 +2.0% 3.169e+13 perf-stat.total.instructions
8.54 ± 2% -1.0 7.54 perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
8.46 ± 2% -1.0 7.47 perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
8.30 ± 2% -1.0 7.31 perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
4.38 ± 2% -0.6 3.81 ± 2% perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
3.20 ± 3% -0.3 2.85 perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
3.00 ± 3% -0.3 2.67 perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
3.40 ± 3% -0.3 3.10 ± 3% perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
2.30 ± 3% -0.3 2.00 perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
3.25 ± 3% -0.3 2.97 ± 3% perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
3.07 ± 2% -0.3 2.79 ± 2% perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
3.05 ± 3% -0.3 2.79 ± 3% perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic
2.50 ± 3% -0.2 2.29 ± 2% perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
2.18 ± 3% -0.2 1.99 perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
1.99 ± 3% -0.2 1.82 perf-profile.calltrace.cycles-pp.clear_bhb_loop.write
1.95 ± 4% -0.2 1.78 ± 2% perf-profile.calltrace.cycles-pp.clear_bhb_loop.read
2.68 ± 3% -0.1 2.54 ± 2% perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
1.55 ± 3% -0.1 1.42 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
1.55 ± 3% -0.1 1.42 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
1.35 ± 3% -0.1 1.24 ± 3% perf-profile.calltrace.cycles-pp.__slab_free.kfree.skb_release_data.consume_skb.unix_stream_read_generic
1.04 ± 3% -0.1 0.96 ± 3% perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
1.12 ± 3% -0.1 1.04 perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
0.62 ± 4% -0.1 0.56 perf-profile.calltrace.cycles-pp.__build_skb_around.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
0.72 ± 3% -0.1 0.66 perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
0.63 ± 2% -0.1 0.57 ± 3% perf-profile.calltrace.cycles-pp.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
0.57 ± 3% -0.0 0.52 ± 2% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter
1.17 ± 3% +0.2 1.32 ± 6% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.42 ± 50% +0.3 0.76 ± 22% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
1.36 ± 3% +0.5 1.88 ± 21% perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic
1.38 ± 3% +0.5 1.91 ± 21% perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
1.43 ± 3% +0.5 1.98 ± 21% perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
1.63 ± 3% +0.7 2.28 ± 21% perf-profile.calltrace.cycles-pp.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
36.49 +0.8 37.34 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
35.51 +0.9 36.43 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
38.59 +0.9 39.52 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
38.32 +1.0 39.27 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
34.44 +1.0 35.42 perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
33.04 +1.1 34.12 perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
8.58 ± 2% -1.0 7.58 perf-profile.children.cycles-pp.unix_stream_read_actor
8.35 ± 2% -1.0 7.36 perf-profile.children.cycles-pp.__skb_datagram_iter
8.50 ± 2% -1.0 7.51 perf-profile.children.cycles-pp.skb_copy_datagram_iter
4.40 ± 2% -0.6 3.83 ± 2% perf-profile.children.cycles-pp._copy_to_iter
5.77 ± 2% -0.4 5.32 ± 3% perf-profile.children.cycles-pp.__memcg_slab_free_hook
4.41 ± 3% -0.4 3.98 perf-profile.children.cycles-pp.__check_object_size
4.80 ± 3% -0.4 4.40 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
3.24 ± 3% -0.4 2.89 perf-profile.children.cycles-pp.simple_copy_to_iter
2.98 ± 3% -0.3 2.64 perf-profile.children.cycles-pp.check_heap_object
3.98 ± 3% -0.3 3.64 perf-profile.children.cycles-pp.clear_bhb_loop
3.44 ± 2% -0.3 3.14 ± 3% perf-profile.children.cycles-pp.skb_release_head_state
3.31 ± 2% -0.3 3.03 ± 3% perf-profile.children.cycles-pp.unix_destruct_scm
3.09 ± 3% -0.3 2.82 ± 3% perf-profile.children.cycles-pp.sock_wfree
2.42 ± 3% -0.2 2.23 ± 3% perf-profile.children.cycles-pp.__slab_free
2.59 ± 2% -0.2 2.42 ± 2% perf-profile.children.cycles-pp.mod_objcg_state
1.78 ± 3% -0.2 1.62 perf-profile.children.cycles-pp.entry_SYSCALL_64
2.76 ± 3% -0.1 2.61 ± 2% perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
1.38 ± 3% -0.1 1.25 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.30 ± 4% -0.1 1.19 perf-profile.children.cycles-pp.obj_cgroup_charge
0.65 ± 4% -0.1 0.57 perf-profile.children.cycles-pp.__build_skb_around
0.66 ± 3% -0.1 0.61 perf-profile.children.cycles-pp.refill_obj_stock
0.73 ± 3% -0.1 0.68 perf-profile.children.cycles-pp.__check_heap_object
0.59 ± 3% -0.1 0.54 ± 2% perf-profile.children.cycles-pp.rw_verify_area
0.66 ± 2% -0.1 0.61 ± 3% perf-profile.children.cycles-pp.skb_unlink
0.55 ± 4% -0.0 0.51 ± 2% perf-profile.children.cycles-pp.__virt_addr_valid
0.28 ± 3% -0.0 0.26 perf-profile.children.cycles-pp.__scm_recv_common
0.16 ± 4% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.is_vmalloc_addr
0.16 ± 3% -0.0 0.14 ± 2% perf-profile.children.cycles-pp.security_socket_recvmsg
0.17 ± 2% -0.0 0.16 perf-profile.children.cycles-pp.put_pid
0.14 ± 3% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.manage_oob
0.11 -0.0 0.10 perf-profile.children.cycles-pp.wait_for_unix_gc
0.06 ± 6% +0.0 0.08 ± 11% perf-profile.children.cycles-pp.os_xsave
0.20 ± 3% +0.0 0.23 ± 7% perf-profile.children.cycles-pp.__get_user_8
0.06 ± 6% +0.0 0.09 ± 17% perf-profile.children.cycles-pp.sched_clock
0.06 ± 6% +0.0 0.09 ± 14% perf-profile.children.cycles-pp.check_preempt_wakeup_fair
0.09 ± 5% +0.0 0.13 ± 18% perf-profile.children.cycles-pp.__switch_to
0.08 ± 4% +0.0 0.12 ± 21% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.15 ± 6% +0.0 0.20 ± 10% perf-profile.children.cycles-pp.__dequeue_entity
0.25 ± 3% +0.0 0.29 ± 9% perf-profile.children.cycles-pp.rseq_ip_fixup
0.09 ± 10% +0.0 0.14 ± 15% perf-profile.children.cycles-pp.pick_eevdf
0.13 ± 7% +0.0 0.18 ± 14% perf-profile.children.cycles-pp.__enqueue_entity
0.08 ± 10% +0.0 0.12 ± 27% perf-profile.children.cycles-pp.wakeup_preempt
0.01 ±200% +0.1 0.06 ± 11% perf-profile.children.cycles-pp.vruntime_eligible
0.01 ±200% +0.1 0.07 ± 23% perf-profile.children.cycles-pp.___perf_sw_event
0.01 ±200% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.put_prev_entity
0.31 ± 2% +0.1 0.38 ± 12% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.22 ± 7% +0.1 0.30 ± 12% perf-profile.children.cycles-pp.set_next_entity
0.14 ± 5% +0.1 0.22 ± 22% perf-profile.children.cycles-pp.pick_task_fair
0.14 ± 44% +0.1 0.24 ± 15% perf-profile.children.cycles-pp.get_any_partial
0.27 ± 5% +0.1 0.37 ± 15% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.33 ± 4% +0.1 0.47 ± 22% perf-profile.children.cycles-pp.enqueue_entity
0.30 ± 4% +0.2 0.46 ± 26% perf-profile.children.cycles-pp.update_load_avg
0.48 ± 4% +0.2 0.72 ± 26% perf-profile.children.cycles-pp.enqueue_task_fair
0.51 ± 3% +0.2 0.75 ± 27% perf-profile.children.cycles-pp.enqueue_task
0.48 ± 6% +0.3 0.75 ± 23% perf-profile.children.cycles-pp.pick_next_task_fair
0.49 ± 6% +0.3 0.76 ± 24% perf-profile.children.cycles-pp.__pick_next_task
0.60 ± 4% +0.3 0.89 ± 25% perf-profile.children.cycles-pp.ttwu_do_activate
1.67 ± 2% +0.6 2.23 ± 20% perf-profile.children.cycles-pp.schedule_timeout
1.64 ± 3% +0.7 2.29 ± 21% perf-profile.children.cycles-pp.unix_stream_data_wait
1.78 ± 4% +0.7 2.53 ± 21% perf-profile.children.cycles-pp.schedule
1.78 ± 4% +0.8 2.54 ± 22% perf-profile.children.cycles-pp.__schedule
36.58 +0.8 37.42 perf-profile.children.cycles-pp.ksys_write
35.60 +0.9 36.51 perf-profile.children.cycles-pp.vfs_write
34.52 +1.0 35.49 perf-profile.children.cycles-pp.sock_write_iter
33.31 +1.0 34.36 perf-profile.children.cycles-pp.unix_stream_sendmsg
4.37 ± 2% -0.6 3.79 ± 2% perf-profile.self.cycles-pp._copy_to_iter
3.94 ± 3% -0.3 3.60 perf-profile.self.cycles-pp.clear_bhb_loop
2.27 ± 3% -0.3 1.98 perf-profile.self.cycles-pp.check_heap_object
3.29 ± 2% -0.3 3.01 ± 5% perf-profile.self.cycles-pp.__memcg_slab_free_hook
2.03 ± 4% -0.3 1.76 ± 2% perf-profile.self.cycles-pp.kmem_cache_free
2.50 ± 3% -0.2 2.25 ± 3% perf-profile.self.cycles-pp.sock_wfree
2.61 ± 2% -0.2 2.37 ± 3% perf-profile.self.cycles-pp.unix_stream_read_generic
2.30 ± 3% -0.2 2.09 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
2.37 ± 3% -0.2 2.18 ± 3% perf-profile.self.cycles-pp.__slab_free
1.04 ± 4% -0.2 0.86 ± 5% perf-profile.self.cycles-pp.skb_release_data
2.19 ± 4% -0.2 2.01 ± 2% perf-profile.self.cycles-pp.mod_objcg_state
1.31 ± 3% -0.1 1.18 perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
1.33 ± 3% -0.1 1.21 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.04 ± 3% -0.1 0.93 perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
1.13 ± 3% -0.1 1.02 perf-profile.self.cycles-pp.__alloc_skb
1.38 ± 2% -0.1 1.29 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.74 ± 3% -0.1 0.66 perf-profile.self.cycles-pp.__skb_datagram_iter
1.11 ± 3% -0.1 1.03 ± 2% perf-profile.self.cycles-pp.sock_write_iter
0.80 ± 3% -0.1 0.74 ± 2% perf-profile.self.cycles-pp.write
0.60 ± 4% -0.1 0.54 perf-profile.self.cycles-pp.__build_skb_around
0.84 ± 4% -0.1 0.78 perf-profile.self.cycles-pp.sock_read_iter
0.69 ± 3% -0.1 0.64 perf-profile.self.cycles-pp.__check_heap_object
0.62 ± 3% -0.1 0.57 perf-profile.self.cycles-pp.refill_obj_stock
0.82 -0.0 0.77 perf-profile.self.cycles-pp.read
0.80 ± 3% -0.0 0.75 ± 3% perf-profile.self.cycles-pp.do_syscall_64
0.51 ± 4% -0.0 0.47 perf-profile.self.cycles-pp.__virt_addr_valid
0.46 ± 2% -0.0 0.43 ± 2% perf-profile.self.cycles-pp.kfree
0.59 ± 3% -0.0 0.56 ± 2% perf-profile.self.cycles-pp.__check_object_size
0.44 ± 2% -0.0 0.41 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.36 ± 3% -0.0 0.32 ± 2% perf-profile.self.cycles-pp.rw_verify_area
0.43 ± 2% -0.0 0.40 ± 2% perf-profile.self.cycles-pp.unix_write_space
0.37 ± 4% -0.0 0.34 ± 2% perf-profile.self.cycles-pp.x64_sys_call
0.34 ± 3% -0.0 0.31 ± 2% perf-profile.self.cycles-pp.__cond_resched
0.29 ± 3% -0.0 0.27 perf-profile.self.cycles-pp.ksys_write
0.30 ± 2% -0.0 0.28 ± 2% perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
0.18 ± 4% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.18 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.unix_destruct_scm
0.21 ± 3% -0.0 0.19 perf-profile.self.cycles-pp.__scm_recv_common
0.25 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.kmalloc_reserve
0.15 ± 2% -0.0 0.14 perf-profile.self.cycles-pp.skb_unlink
0.15 ± 2% -0.0 0.14 perf-profile.self.cycles-pp.unix_scm_to_skb
0.07 ± 9% +0.0 0.10 ± 19% perf-profile.self.cycles-pp.pick_eevdf
0.09 ± 5% +0.0 0.13 ± 16% perf-profile.self.cycles-pp.__switch_to
0.11 ± 7% +0.0 0.14 ± 10% perf-profile.self.cycles-pp.__dequeue_entity
0.08 ± 5% +0.0 0.12 ± 22% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.02 ±122% +0.0 0.06 ± 17% perf-profile.self.cycles-pp.native_sched_clock
0.13 ± 8% +0.1 0.18 ± 14% perf-profile.self.cycles-pp.__enqueue_entity
0.00 +0.1 0.06 ± 9% perf-profile.self.cycles-pp.vruntime_eligible
0.27 ± 5% +0.1 0.37 ± 15% perf-profile.self.cycles-pp.switch_mm_irqs_off
***************************************************************************************************
lkp-icl-2sp7: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/membarrier/stress-ng/60s
commit:
f553741ac8 ("sched: Cancel the slice protection of the idle entity")
2ae891b826 ("sched: Reduce the default slice to avoid tasks getting an extra tick")
f553741ac8c0e467 2ae891b826958b60919ea21c727
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.08 -0.1 0.99 mpstat.cpu.all.irq%
67.18 ± 2% -11.9% 59.20 ± 5% mpstat.max_utilization_pct
3401 ± 19% -31.4% 2332 ± 18% perf-c2c.DRAM.remote
2396 ± 3% -23.1% 1844 ± 18% perf-c2c.HITM.remote
29248 +14.3% 33418 vmstat.system.cs
788485 -9.1% 716631 vmstat.system.in
191106 -1.7% 187946 proc-vmstat.nr_anon_pages
535277 ± 2% +5.6% 565009 ± 4% proc-vmstat.numa_hit
469052 ± 2% +6.3% 498763 ± 5% proc-vmstat.numa_local
51285 ± 7% +54.3% 79119 ± 31% proc-vmstat.numa_pages_migrated
51285 ± 7% +54.3% 79119 ± 31% proc-vmstat.pgmigrate_success
16417 ± 7% +131.4% 37986 ± 78% proc-vmstat.pgreuse
505.28 -10.6% 451.92 stress-ng.membarrier.membarrier_calls_per_sec
97160 -10.5% 86939 stress-ng.membarrier.ops
1618 -10.5% 1448 stress-ng.membarrier.ops_per_sec
55094 ± 5% +277.5% 207976 ± 9% stress-ng.time.involuntary_context_switches
3195 ± 2% -8.3% 2931 stress-ng.time.percent_of_cpu_this_job_got
1921 ± 2% -8.3% 1761 stress-ng.time.system_time
1047923 +5.9% 1109900 stress-ng.time.voluntary_context_switches
5.501e+09 ± 2% -7.8% 5.074e+09 perf-stat.i.branch-instructions
30090 +14.4% 34431 perf-stat.i.context-switches
1.041e+11 ± 2% -7.6% 9.627e+10 perf-stat.i.cpu-cycles
10683 +6.7% 11402 perf-stat.i.cpu-migrations
2.73e+10 ± 2% -7.6% 2.522e+10 perf-stat.i.instructions
5.406e+09 ± 2% -7.8% 4.985e+09 perf-stat.ps.branch-instructions
29571 +14.4% 33836 perf-stat.ps.context-switches
1.024e+11 ± 2% -7.6% 9.46e+10 perf-stat.ps.cpu-cycles
10498 +6.7% 11203 perf-stat.ps.cpu-migrations
2.683e+10 ± 2% -7.6% 2.478e+10 perf-stat.ps.instructions
1.631e+12 ± 2% -7.7% 1.505e+12 perf-stat.total.instructions
698086 ± 4% -12.0% 614339 ± 3% sched_debug.cfs_rq:/.avg_vruntime.avg
918198 ± 7% -13.5% 794083 ± 6% sched_debug.cfs_rq:/.avg_vruntime.max
650282 ± 4% -12.9% 566525 ± 4% sched_debug.cfs_rq:/.avg_vruntime.min
698086 ± 4% -12.0% 614339 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
918198 ± 7% -13.5% 794083 ± 6% sched_debug.cfs_rq:/.min_vruntime.max
650282 ± 4% -12.9% 566525 ± 4% sched_debug.cfs_rq:/.min_vruntime.min
13.48 ± 36% +250.6% 47.25 ± 40% sched_debug.cfs_rq:/.removed.load_avg.avg
77.26 ± 17% +91.9% 148.27 ± 24% sched_debug.cfs_rq:/.removed.load_avg.stddev
5.08 ± 33% +246.5% 17.60 ± 35% sched_debug.cfs_rq:/.removed.runnable_avg.avg
212.33 ± 20% +30.1% 276.17 ± 7% sched_debug.cfs_rq:/.removed.runnable_avg.max
30.44 ± 21% +89.0% 57.52 ± 14% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
5.08 ± 33% +246.6% 17.60 ± 35% sched_debug.cfs_rq:/.removed.util_avg.avg
212.25 ± 21% +30.1% 276.08 ± 7% sched_debug.cfs_rq:/.removed.util_avg.max
30.43 ± 21% +89.0% 57.51 ± 14% sched_debug.cfs_rq:/.removed.util_avg.stddev
15701 +12.8% 17719 sched_debug.cpu.nr_switches.avg
11778 ± 7% +20.3% 14165 ± 8% sched_debug.cpu.nr_switches.min
-202.17 +21.0% -244.58 sched_debug.cpu.nr_uninterruptible.min
1.43 ± 36% -99.6% 0.01 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
0.94 ± 23% -91.9% 0.08 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
1.60 ± 68% -99.9% 0.00 ±223% perf-sched.sch_delay.avg.ms.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
1.39 ± 8% +71.7% 2.38 ± 7% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
1.95 ± 5% +23.7% 2.41 ± 5% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_private_expedited
0.89 ± 4% -16.0% 0.75 ± 3% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
0.01 ± 25% +75.0% 0.02 ± 34% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.06 ± 11% -37.5% 0.04 ± 40% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.80 ±145% +478.8% 4.62 ± 52% perf-sched.sch_delay.max.ms.__cond_resched.__mutex_lock.constprop.0.membarrier_private_expedited
5.29 ± 41% -99.9% 0.01 ±223% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
6.37 ± 13% -93.7% 0.40 ±223% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
2.22 ± 49% -99.9% 0.00 ±223% perf-sched.sch_delay.max.ms.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
10.40 ± 13% +32.1% 13.74 ± 5% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
4.55 ± 5% -34.9% 2.96 ± 42% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.98 ± 4% +33.4% 1.30 ± 6% perf-sched.total_sch_delay.average.ms
22.34 -12.3% 19.59 perf-sched.total_wait_and_delay.average.ms
102076 +18.6% 121096 perf-sched.total_wait_and_delay.count.ms
21.37 -14.4% 18.29 perf-sched.total_wait_time.average.ms
515.07 ± 36% +63.2% 840.46 ± 16% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
11.25 ± 5% +56.4% 17.59 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
15.80 -13.5% 13.67 perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
487.31 ± 4% +16.4% 567.38 ± 2% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
8.00 ± 26% +95.8% 15.67 ± 20% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1384 ± 12% +58.1% 2188 ± 8% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
10678 ± 7% +270.1% 39521 ± 4% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_private_expedited
85629 -12.4% 75039 ± 3% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
2443 ± 44% -58.3% 1018 perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
2099 ± 55% -76.1% 501.21 perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
15.94 ± 9% -86.6% 2.13 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
515.06 ± 36% +63.2% 840.45 ± 16% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
15.25 ± 3% -85.0% 2.29 ±223% perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
427.24 ± 78% -99.6% 1.55 ±107% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
10.38 ± 53% -95.2% 0.50 ±223% perf-sched.wait_time.avg.ms.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
48.58 ±185% -94.2% 2.80 ± 99% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
9.86 ± 5% +54.2% 15.21 ± 7% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.membarrier_global_expedited
14.92 -13.4% 12.92 perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
1.30 ± 8% -11.1% 1.15 ± 6% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
487.30 ± 4% +16.4% 567.36 ± 2% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
6.13 ±141% +268.5% 22.60 ± 17% perf-sched.wait_time.max.ms.__cond_resched.__mutex_lock.constprop.0.membarrier_private_expedited
25.13 ± 9% -91.5% 2.13 ±223% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
25.92 ± 12% -86.1% 3.61 ±223% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
2260 ± 59% -99.9% 3.00 ±118% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
13.02 ± 43% -96.2% 0.50 ±223% perf-sched.wait_time.max.ms.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
2443 ± 44% -58.3% 1018 perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
2097 ± 55% -76.1% 500.54 perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
2025-02-25 2:32 [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression kernel test robot
@ 2025-02-25 9:31 ` Chen Yu
2025-02-25 9:45 ` Vincent Guittot
2025-02-25 12:27 ` Peter Zijlstra
0 siblings, 2 replies; 9+ messages in thread
From: Chen Yu @ 2025-02-25 9:31 UTC (permalink / raw)
To: zihan zhou
Cc: oe-lkp, kernel test robot, lkp, linux-kernel, x86, Peter Zijlstra,
Vincent Guittot, aubrey.li, yu.c.chen
On 2025-02-25 at 10:32:13 +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 6.2% regression of hackbench.throughput on:
>
>
> commit: 2ae891b826958b60919ea21c727f77bcd6ffcc2c ("sched: Reduce the default slice to avoid tasks getting an extra tick")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
>
> [test failed on linux-next/master d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa]
>
> testcase: hackbench
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> parameters:
>
> nr_threads: 100%
> iterations: 4
> mode: process
> ipc: socket
> cpufreq_governor: performance
>
>
> 39754543 ± 3% +56.8% 62349308 hackbench.time.involuntary_context_switches
>
This patch shrinks the base_slice so the deadline is reached earlier to trigger the
tick preemption IIUC. For the hackbench case, my assumption is that hackbench seems to
encounter more wakeup preemption and hurts throughtput. If more frequent tick preemption
is needed, but more frequent wakeup preemption is not, are we able to do this base_slice
shrink for tick preemption only rather than the wakeup preemption? A wild guess, can we
add smaller base_slice 0.7 in update_deadline() for tick preemption, but remains the old
value 0.75 in update_deadline() for wakeup preemption during enqueue.
But consider that the 6% regression is not that high, and the user might customize
base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
future(we have encountered some SPECjbb regression due to over-preemption).
thanks,
Chenyu
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
2025-02-25 9:31 ` Chen Yu
@ 2025-02-25 9:45 ` Vincent Guittot
2025-02-25 10:15 ` Chen Yu
2025-02-25 12:27 ` Peter Zijlstra
1 sibling, 1 reply; 9+ messages in thread
From: Vincent Guittot @ 2025-02-25 9:45 UTC (permalink / raw)
To: Chen Yu
Cc: zihan zhou, oe-lkp, kernel test robot, lkp, linux-kernel, x86,
Peter Zijlstra, aubrey.li, yu.c.chen
On Tue, 25 Feb 2025 at 10:31, Chen Yu <yu.chen.surf@foxmail.com> wrote:
>
> On 2025-02-25 at 10:32:13 +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 6.2% regression of hackbench.throughput on:
> >
> >
> > commit: 2ae891b826958b60919ea21c727f77bcd6ffcc2c ("sched: Reduce the default slice to avoid tasks getting an extra tick")
> > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
> >
> > [test failed on linux-next/master d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa]
> >
> > testcase: hackbench
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> > parameters:
> >
> > nr_threads: 100%
> > iterations: 4
> > mode: process
> > ipc: socket
> > cpufreq_governor: performance
> >
> >
> > 39754543 ą 3% +56.8% 62349308 hackbench.time.involuntary_context_switches
> >
>
> This patch shrinks the base_slice so the deadline is reached earlier to trigger the
> tick preemption IIUC. For the hackbench case, my assumption is that hackbench seems to
For systems with more than 8 CPUs, the base slice was
0.75*(1+ilog2(8)) = 3ms which is exactly 3 tick periods at 1000hz but
because the tick period is almost never fully accounted to the task,
the task was running 4 tick periods instead of 3. The normalized
base_slice has been reduced from 0.75 to 0.70ms so the base slice
becomes 2.8ms for 8 CPUs and more and the main result is that tasks
will run 3 tick periods instead of 4.
> encounter more wakeup preemption and hurts throughtput. If more frequent tick preemption
> is needed, but more frequent wakeup preemption is not, are we able to do this base_slice
> shrink for tick preemption only rather than the wakeup preemption? A wild guess, can we
> add smaller base_slice 0.7 in update_deadline() for tick preemption, but remains the old
> value 0.75 in update_deadline() for wakeup preemption during enqueue.
>
> But consider that the 6% regression is not that high, and the user might customize
> base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> future(we have encountered some SPECjbb regression due to over-preemption).
>
> thanks,
> Chenyu
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
2025-02-25 9:45 ` Vincent Guittot
@ 2025-02-25 10:15 ` Chen Yu
0 siblings, 0 replies; 9+ messages in thread
From: Chen Yu @ 2025-02-25 10:15 UTC (permalink / raw)
To: Vincent Guittot
Cc: zihan zhou, oe-lkp, kernel test robot, lkp, linux-kernel, x86,
Peter Zijlstra, aubrey.li, yu.c.chen, yu.chen.surf
On 2025-02-25 at 10:45:35 +0100, Vincent Guittot wrote:
> On Tue, 25 Feb 2025 at 10:31, Chen Yu <yu.chen.surf@foxmail.com> wrote:
> >
> > On 2025-02-25 at 10:32:13 +0800, kernel test robot wrote:
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a 6.2% regression of hackbench.throughput on:
> > >
> > >
> > > commit: 2ae891b826958b60919ea21c727f77bcd6ffcc2c ("sched: Reduce the default slice to avoid tasks getting an extra tick")
> > > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
> > >
> > > [test failed on linux-next/master d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa]
> > >
> > > testcase: hackbench
> > > config: x86_64-rhel-9.4
> > > compiler: gcc-12
> > > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> > > parameters:
> > >
> > > nr_threads: 100%
> > > iterations: 4
> > > mode: process
> > > ipc: socket
> > > cpufreq_governor: performance
> > >
> > >
> > > 39754543 ą 3% +56.8% 62349308 hackbench.time.involuntary_context_switches
> > >
> >
> > This patch shrinks the base_slice so the deadline is reached earlier to trigger the
> > tick preemption IIUC. For the hackbench case, my assumption is that hackbench seems to
>
> For systems with more than 8 CPUs, the base slice was
> 0.75*(1+ilog2(8)) = 3ms which is exactly 3 tick periods at 1000hz but
> because the tick period is almost never fully accounted to the task,
> the task was running 4 tick periods instead of 3. The normalized
> base_slice has been reduced from 0.75 to 0.70ms so the base slice
> becomes 2.8ms for 8 CPUs and more and the main result is that tasks
> will run 3 tick periods instead of 4.
>
Thanks for the detailed explanation; I now understand the background.
It is a correct fix for tick preemption and slightly affects wakeup
preemption (smaller deadline in place_entity())
thanks,
Chenyu
> > encounter more wakeup preemption and hurts throughtput. If more frequent tick preemption
> > is needed, but more frequent wakeup preemption is not, are we able to do this base_slice
> > shrink for tick preemption only rather than the wakeup preemption? A wild guess, can we
> > add smaller base_slice 0.7 in update_deadline() for tick preemption, but remains the old
> > value 0.75 in update_deadline() for wakeup preemption during enqueue.
> >
> > But consider that the 6% regression is not that high, and the user might customize
> > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > future(we have encountered some SPECjbb regression due to over-preemption).
> >
> > thanks,
> > Chenyu
> >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
2025-02-25 9:31 ` Chen Yu
2025-02-25 9:45 ` Vincent Guittot
@ 2025-02-25 12:27 ` Peter Zijlstra
2025-02-25 13:15 ` Chen Yu
1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2025-02-25 12:27 UTC (permalink / raw)
To: Chen Yu
Cc: zihan zhou, oe-lkp, kernel test robot, lkp, linux-kernel, x86,
Vincent Guittot, aubrey.li, yu.c.chen
On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
>
> But consider that the 6% regression is not that high, and the user might customize
> base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> future(we have encountered some SPECjbb regression due to over-preemption).
You can specify a per-task slice using sched_attr::sched_runtime. Also
see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
set request/slice suggestion")
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
2025-02-25 12:27 ` Peter Zijlstra
@ 2025-02-25 13:15 ` Chen Yu
2025-02-25 13:42 ` Qais Yousef
0 siblings, 1 reply; 9+ messages in thread
From: Chen Yu @ 2025-02-25 13:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: zihan zhou, oe-lkp, kernel test robot, lkp, linux-kernel, x86,
Vincent Guittot, aubrey.li, yu.c.chen
On 2025-02-25 at 13:27:05 +0100, Peter Zijlstra wrote:
> On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
>
> >
> > But consider that the 6% regression is not that high, and the user might customize
> > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > future(we have encountered some SPECjbb regression due to over-preemption).
>
> You can specify a per-task slice using sched_attr::sched_runtime. Also
> see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
> set request/slice suggestion")
>
>
Thanks, we'll have a try during the next test cycle.
thanks,
Chenyu
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
2025-02-25 13:15 ` Chen Yu
@ 2025-02-25 13:42 ` Qais Yousef
2025-02-25 15:35 ` Chen Yu
0 siblings, 1 reply; 9+ messages in thread
From: Qais Yousef @ 2025-02-25 13:42 UTC (permalink / raw)
To: Chen Yu
Cc: Peter Zijlstra, zihan zhou, oe-lkp, kernel test robot, lkp,
linux-kernel, x86, Vincent Guittot, aubrey.li, yu.c.chen
On 02/25/25 21:15, Chen Yu wrote:
> On 2025-02-25 at 13:27:05 +0100, Peter Zijlstra wrote:
> > On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
> >
> > >
> > > But consider that the 6% regression is not that high, and the user might customize
> > > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > > future(we have encountered some SPECjbb regression due to over-preemption).
> >
> > You can specify a per-task slice using sched_attr::sched_runtime. Also
> > see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
> > set request/slice suggestion")
> >
> >
>
> Thanks, we'll have a try during the next test cycle.
Could you also try with HRTICK enabled?
echo HRTICK | sudo tee /sys/kernel/debug/sched/features
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
2025-02-25 13:42 ` Qais Yousef
@ 2025-02-25 15:35 ` Chen Yu
2025-02-25 23:10 ` Qais Yousef
0 siblings, 1 reply; 9+ messages in thread
From: Chen Yu @ 2025-02-25 15:35 UTC (permalink / raw)
To: Qais Yousef
Cc: Peter Zijlstra, zihan zhou, oe-lkp, kernel test robot, lkp,
linux-kernel, x86, Vincent Guittot, aubrey.li, yu.c.chen
On 2025-02-25 at 13:42:20 +0000, Qais Yousef wrote:
> On 02/25/25 21:15, Chen Yu wrote:
> > On 2025-02-25 at 13:27:05 +0100, Peter Zijlstra wrote:
> > > On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
> > >
> > > >
> > > > But consider that the 6% regression is not that high, and the user might customize
> > > > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > > > future(we have encountered some SPECjbb regression due to over-preemption).
> > >
> > > You can specify a per-task slice using sched_attr::sched_runtime. Also
> > > see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
> > > set request/slice suggestion")
> > >
> > >
> >
> > Thanks, we'll have a try during the next test cycle.
>
> Could you also try with HRTICK enabled?
>
> echo HRTICK | sudo tee /sys/kernel/debug/sched/features
Sure. Is HRTICK supposed to encourage preemption? I thought
hackbench might be sensitive to preemption(downgrading).
thanks,
Chenyu
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression
2025-02-25 15:35 ` Chen Yu
@ 2025-02-25 23:10 ` Qais Yousef
0 siblings, 0 replies; 9+ messages in thread
From: Qais Yousef @ 2025-02-25 23:10 UTC (permalink / raw)
To: Chen Yu
Cc: Peter Zijlstra, zihan zhou, oe-lkp, kernel test robot, lkp,
linux-kernel, x86, Vincent Guittot, aubrey.li, yu.c.chen
On 02/25/25 23:35, Chen Yu wrote:
> On 2025-02-25 at 13:42:20 +0000, Qais Yousef wrote:
> > On 02/25/25 21:15, Chen Yu wrote:
> > > On 2025-02-25 at 13:27:05 +0100, Peter Zijlstra wrote:
> > > > On Tue, Feb 25, 2025 at 05:31:34PM +0800, Chen Yu wrote:
> > > >
> > > > >
> > > > > But consider that the 6% regression is not that high, and the user might customize
> > > > > base_slice via debugfs on-demand, we can keep an eye on this and revist it in the
> > > > > future(we have encountered some SPECjbb regression due to over-preemption).
> > > >
> > > > You can specify a per-task slice using sched_attr::sched_runtime. Also
> > > > see commit 857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to
> > > > set request/slice suggestion")
> > > >
> > > >
> > >
> > > Thanks, we'll have a try during the next test cycle.
> >
> > Could you also try with HRTICK enabled?
> >
> > echo HRTICK | sudo tee /sys/kernel/debug/sched/features
>
> Sure. Is HRTICK supposed to encourage preemption? I thought
> hackbench might be sensitive to preemption(downgrading).
Yes my bad, I don't know why I read this as preemption was happening later not
earlier. Please ignore me then.
Thanks
--
Qais Yousef
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-02-25 23:10 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-25 2:32 [tip:sched/core] [sched] 2ae891b826: hackbench.throughput 6.2% regression kernel test robot
2025-02-25 9:31 ` Chen Yu
2025-02-25 9:45 ` Vincent Guittot
2025-02-25 10:15 ` Chen Yu
2025-02-25 12:27 ` Peter Zijlstra
2025-02-25 13:15 ` Chen Yu
2025-02-25 13:42 ` Qais Yousef
2025-02-25 15:35 ` Chen Yu
2025-02-25 23:10 ` Qais Yousef
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox