All of lore.kernel.org
 help / color / mirror / Atom feed
* [opencloudos:next] [sched]  44f5072e76:  netperf.Throughput_Mbps 14.4% improvement
@ 2024-09-24 12:42 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2024-09-24 12:42 UTC (permalink / raw)
  To: kaixuxia, frankjpliu, kasong, sagazchen, kernelxing, aurelianliu,
	deshengwu, flyingpeng, jason.zeng, wu.zheng, yingbao.jia,
	pei.p.jia
  Cc: oe-lkp, lkp, oliver.sang



Hello,

kernel test robot noticed a 14.4% improvement of netperf.Throughput_Mbps on:


commit: 44f5072e7684629650ca645a35698d5388c23ad7 ("Revert "sched: adaptive default skew_tick value"")
https://gitee.com/OpenCloudOS/OpenCloudOS-Kernel.git next

testcase: netperf
test machine: 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory
parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 50%
	cluster: cs-localhost
	send_size: 10K
	test: SCTP_STREAM_MANY
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240924/202409241630.7e2e7b8a-oliver.sang@intel.com

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/send_size/tbox_group/test/testcase:
  cs-localhost/gcc-12/performance/ipv4/x86_64-oc_stream_base_config/50%/debian-12-x86_64-20240206.cgz/300s/10K/lkp-emr-2sp1/SCTP_STREAM_MANY/netperf

commit: 
  a1aa259039 ("Merge branch 'likexu/kvm/cube-optimization' into 'master' (merge request !158)")
  44f5072e76 ("Revert "sched: adaptive default skew_tick value"")

a1aa2590392cbeea 44f5072e7684629650ca645a356 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 3.096e+08 ±  2%     +13.9%  3.526e+08 ±  3%  cpuidle..usage
      0.00 ± 28%     -42.6%       0.00 ± 11%  sched_debug.cpu.next_balance.stddev
     52332 ± 21%     +78.9%      93622 ± 20%  sched_debug.cpu.nr_switches.min
     17.74 ±  2%     +15.3%      20.45 ±  2%  vmstat.procs.r
   1985288 ±  2%     +14.7%    2276309 ±  3%  vmstat.system.cs
     53740 ±  2%      +8.4%      58264 ±  3%  vmstat.system.in
      0.99 ±  2%      +0.2        1.23 ±  2%  mpstat.cpu.all.soft%
      5.10 ±  2%      +0.9        6.05 ±  3%  mpstat.cpu.all.sys%
      0.19            +0.0        0.22        mpstat.cpu.all.usr%
      8.35 ±  3%     +23.1%      10.28 ±  3%  mpstat.max_utilization_pct
     17.50 ± 21%     +46.7%      25.67 ± 10%  perf-c2c.DRAM.local
      1456 ±  5%    +124.4%       3269 ±  2%  perf-c2c.DRAM.remote
      9427 ±  6%     +17.7%      11098 ±  6%  perf-c2c.HITM.local
    720.17 ±  5%     +39.4%       1004 ±  6%  perf-c2c.HITM.remote
     10148 ±  6%     +19.3%      12102 ±  5%  perf-c2c.HITM.total
   5468174           +12.3%    6142120 ±  5%  meminfo.Cached
   3087455 ±  3%     +21.8%    3760849 ±  8%  meminfo.Committed_AS
   2141836 ±  4%     +31.5%    2815792 ± 11%  meminfo.Inactive
   2140829 ±  4%     +31.4%    2813119 ± 11%  meminfo.Inactive(anon)
     81575           +20.7%      98438 ±  8%  meminfo.Mapped
   2148304 ±  4%     +31.3%    2820586 ± 11%  meminfo.Shmem
      4117 ± 88%     +73.3%       7133 ± 50%  numa-vmstat.node2.nr_mapped
    562669 ±  8%     +28.1%     720575 ± 14%  numa-vmstat.node3.nr_file_pages
    529864 ±  5%     +32.2%     700439 ± 11%  numa-vmstat.node3.nr_inactive_anon
      9314 ±  2%     +38.3%      12876 ± 16%  numa-vmstat.node3.nr_mapped
    529765 ±  5%     +32.2%     700351 ± 11%  numa-vmstat.node3.nr_shmem
    529864 ±  5%     +32.2%     700439 ± 11%  numa-vmstat.node3.nr_zone_inactive_anon
     16353 ± 87%     +73.6%      28387 ± 49%  numa-meminfo.node2.Mapped
   2250470 ±  8%     +28.0%    2880924 ± 14%  numa-meminfo.node3.FilePages
   2120053 ±  5%     +32.1%    2800479 ± 11%  numa-meminfo.node3.Inactive
   2119249 ±  5%     +32.1%    2800379 ± 11%  numa-meminfo.node3.Inactive(anon)
     35285 ±  2%     +39.5%      49230 ± 15%  numa-meminfo.node3.Mapped
   2858241 ±  4%     +25.2%    3577576 ±  9%  numa-meminfo.node3.MemUsed
   2118855 ±  5%     +32.1%    2800027 ± 11%  numa-meminfo.node3.Shmem
      2420 ±  2%     +14.4%       2770 ±  3%  netperf.ThroughputBoth_Mbps
    309868 ±  2%     +14.4%     354616 ±  3%  netperf.ThroughputBoth_total_Mbps
      2420 ±  2%     +14.4%       2770 ±  3%  netperf.Throughput_Mbps
    309868 ±  2%     +14.4%     354616 ±  3%  netperf.Throughput_total_Mbps
     11228 ±  2%     +27.6%      14328 ±  3%  netperf.time.involuntary_context_switches
      1036           +15.9%       1200 ±  3%  netperf.time.percent_of_cpu_this_job_got
      3068           +15.9%       3556 ±  3%  netperf.time.system_time
     59.04           +11.4%      65.76 ±  2%  netperf.time.user_time
 1.135e+09 ±  2%     +14.4%  1.299e+09 ±  3%  netperf.workload
   1366778           +12.4%    1535834 ±  5%  proc-vmstat.nr_file_pages
    534938 ±  4%     +31.5%     703580 ± 11%  proc-vmstat.nr_inactive_anon
     20750           +20.4%      24985 ±  8%  proc-vmstat.nr_mapped
    536809 ±  4%     +31.4%     705449 ± 11%  proc-vmstat.nr_shmem
    534938 ±  4%     +31.5%     703580 ± 11%  proc-vmstat.nr_zone_inactive_anon
 1.466e+09 ±  2%     +14.5%  1.678e+09 ±  3%  proc-vmstat.numa_hit
 1.464e+09 ±  2%     +14.5%  1.676e+09 ±  3%  proc-vmstat.numa_local
 8.422e+09 ±  2%     +14.5%  9.639e+09 ±  3%  proc-vmstat.pgalloc_normal
   1647051            +1.8%    1677130        proc-vmstat.pgfault
 8.421e+09 ±  2%     +14.5%  9.638e+09 ±  3%  proc-vmstat.pgfree
 1.169e+10 ±  2%     +14.7%  1.341e+10 ±  3%  perf-stat.i.branch-instructions
      0.48            -0.0        0.47        perf-stat.i.branch-miss-rate%
  55170417 ±  2%     +11.1%   61308068 ±  2%  perf-stat.i.branch-misses
  80142341           +11.7%   89535369 ±  2%  perf-stat.i.cache-misses
  1.68e+09 ±  2%     +14.6%  1.925e+09 ±  3%  perf-stat.i.cache-references
   2005173 ±  2%     +14.8%    2301925 ±  3%  perf-stat.i.context-switches
 7.149e+10 ±  2%     +15.4%  8.253e+10 ±  4%  perf-stat.i.cpu-cycles
    474.46            +3.5%     491.30        perf-stat.i.cpu-migrations
 6.504e+10 ±  2%     +14.7%  7.462e+10 ±  3%  perf-stat.i.instructions
      7.83 ±  2%     +14.8%       8.99 ±  3%  perf-stat.i.metric.K/sec
      5073            +2.2%       5184        perf-stat.i.minor-faults
      5073            +2.2%       5184        perf-stat.i.page-faults
      0.02 ±  6%     -25.8%       0.02 ± 12%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.__x64_sys_wait4
      0.05 ±  4%     -19.0%       0.04 ± 10%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.09 ±  2%     +12.2%       0.10 ±  3%  perf-sched.sch_delay.avg.ms.schedule_timeout.sctp_wait_for_sndbuf.sctp_sendmsg_to_asoc.sctp_sendmsg
      0.11 ±  8%   +3964.3%       4.51 ±216%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.30 ±  5%     -18.8%       0.25 ± 13%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1.87 ±  5%     -25.5%       1.40 ± 11%  perf-sched.total_wait_and_delay.average.ms
   2029785 ±  5%     +32.8%    2695971 ± 11%  perf-sched.total_wait_and_delay.count.ms
      1.87 ±  5%     -25.7%       1.39 ± 11%  perf-sched.total_wait_time.average.ms
     25.91 ± 63%     -59.0%      10.62 ± 21%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
    189.87 ± 16%     +91.0%     362.66 ± 19%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.schedule_hrtimeout_range.do_poll.constprop.0
      0.30 ±  5%     -24.8%       0.23 ± 11%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg
     49.67 ± 13%     +52.0%      75.50 ± 19%  perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.sctp_sendmsg.inet_sendmsg
      1536           -16.7%       1280        perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.wait_for_completion.affine_move_task.__set_cpus_allowed_ptr_locked
     29.33 ± 13%     -28.4%      21.00 ± 10%  perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     66.50 ± 15%     -47.9%      34.67 ± 21%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.schedule_hrtimeout_range.do_poll.constprop.0
   1991444 ±  5%     +33.5%    2658433 ± 11%  perf-sched.wait_and_delay.count.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg
      2429           -10.2%       2181        perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.74            -9.3%       2.48        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion.affine_move_task.__set_cpus_allowed_ptr_locked
     25.87 ± 63%     -59.1%      10.58 ± 21%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
    189.78 ± 16%     +91.0%     362.52 ± 19%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.schedule_hrtimeout_range.do_poll.constprop.0
      0.30 ±  5%     -25.0%       0.22 ± 11%  perf-sched.wait_time.avg.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg
     49.66 ±142%     -87.0%       6.45 ± 92%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.irqentry_exit
      0.84 ±  2%      +0.0        0.89 ±  2%  perf-profile.calltrace.cycles-pp.__sk_mem_reclaim.sctp_wfree.skb_release_head_state.consume_skb.sctp_chunk_put
      0.82 ±  3%      +0.1        0.88 ±  2%  perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.__sk_mem_reclaim.sctp_wfree.skb_release_head_state.consume_skb
      1.57            +0.1        1.63        perf-profile.calltrace.cycles-pp.sctp_wfree.skb_release_head_state.consume_skb.sctp_chunk_put.sctp_chunk_free
      1.65            +0.1        1.72        perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.sctp_chunk_put.sctp_chunk_free.sctp_outq_sack
      4.22            -0.1        4.08 ±  2%  perf-profile.children.cycles-pp.__schedule
      2.01            -0.1        1.90 ±  3%  perf-profile.children.cycles-pp.schedule_idle
      0.14 ± 11%      -0.1        0.05 ±  8%  perf-profile.children.cycles-pp.nohz_run_idle_balance
      0.27 ±  4%      -0.1        0.22 ±  4%  perf-profile.children.cycles-pp.tick_nohz_idle_exit
      0.22 ±  9%      -0.0        0.17 ±  5%  perf-profile.children.cycles-pp.tick_nohz_idle_stop_tick
      0.20 ±  6%      -0.0        0.16 ±  6%  perf-profile.children.cycles-pp.tick_nohz_stop_tick
      0.78            -0.0        0.74 ±  2%  perf-profile.children.cycles-pp.sctp_packet_transmit_chunk
      0.18 ±  7%      -0.0        0.14 ±  5%  perf-profile.children.cycles-pp.quiet_vmstat
      0.10 ±  5%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.tick_nohz_restart_sched_tick
      0.11 ±  3%      -0.0        0.10 ±  5%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      0.05 ±  7%      +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.tick_program_event
      0.53            +0.0        0.55 ±  2%  perf-profile.children.cycles-pp.drain_stock
      0.18 ±  5%      +0.0        0.20 ±  3%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.04 ± 44%      +0.0        0.07 ±  7%  perf-profile.children.cycles-pp.idle_cpu
      0.22 ±  6%      +0.0        0.25 ±  4%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.22 ±  7%      +0.0        0.26 ±  4%  perf-profile.children.cycles-pp.find_busiest_group
      0.90 ±  2%      +0.0        0.94        perf-profile.children.cycles-pp.refill_stock
      0.26 ±  5%      +0.0        0.31 ±  4%  perf-profile.children.cycles-pp.load_balance
      0.14 ±  3%      +0.1        0.20 ±  4%  perf-profile.children.cycles-pp.tick_sched_handle
      0.08            +0.1        0.14 ±  5%  perf-profile.children.cycles-pp.scheduler_tick
      1.68            +0.1        1.74        perf-profile.children.cycles-pp.sctp_wfree
      0.13 ±  4%      +0.1        0.19 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.nohz_balancer_kick
      0.06 ±  9%      +0.1        0.13 ±  8%  perf-profile.children.cycles-pp.update_blocked_averages
      0.16 ±  2%      +0.1        0.24 ±  5%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.00            +0.1        0.07 ±  9%  perf-profile.children.cycles-pp.trigger_load_balance
      0.15 ±  3%      +0.1        0.22 ±  5%  perf-profile.children.cycles-pp.tick_sched_timer
      0.08 ± 13%      +0.1        0.16 ±  2%  perf-profile.children.cycles-pp.rebalance_domains
      4.48            +0.1        4.56        perf-profile.children.cycles-pp.consume_skb
      1.98 ±  2%      +0.1        2.06 ±  2%  perf-profile.children.cycles-pp.__sk_mem_reduce_allocated
      3.35            +0.1        3.45        perf-profile.children.cycles-pp.skb_release_head_state
      0.21 ±  3%      +0.1        0.31 ±  4%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.21 ±  2%      +0.1        0.32 ±  4%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.19 ± 10%      +0.1        0.31 ±  7%  perf-profile.children.cycles-pp._nohz_idle_balance
      0.52 ±  4%      +0.1        0.64 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.13 ± 11%      +0.1        0.26 ±  8%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.56 ±  4%      +0.1        0.70 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.12 ± 11%      +0.1        0.26 ±  8%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.36 ±  7%      +0.2        0.53 ±  6%  perf-profile.children.cycles-pp.irq_exit_rcu
      0.36 ±  6%      +0.2        0.53 ±  6%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.20 ± 10%      +0.2        0.37 ±  6%  perf-profile.children.cycles-pp.run_rebalance_domains
      0.10            -0.0        0.08        perf-profile.self.cycles-pp.syscall_enter_from_user_mode
      0.09 ±  5%      +0.0        0.10 ±  4%  perf-profile.self.cycles-pp.ipv4_dst_check
      0.18 ±  4%      +0.0        0.20 ±  3%  perf-profile.self.cycles-pp.ktime_get
      0.09 ±  5%      +0.0        0.11 ±  8%  perf-profile.self.cycles-pp.lock_sock_nested
      0.16 ±  4%      +0.0        0.18 ±  3%  perf-profile.self.cycles-pp.sctp_outq_tail
      0.03 ± 70%      +0.0        0.06 ±  7%  perf-profile.self.cycles-pp.idle_cpu
      0.00            +0.1        0.05 ±  7%  perf-profile.self.cycles-pp.sctp_sf_do_prm_send




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-09-24 12:43 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-24 12:42 [opencloudos:next] [sched] 44f5072e76: netperf.Throughput_Mbps 14.4% improvement kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.