All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-next:master] [udp]  6471658dc6:  netperf.Throughput_Mbps 200.0% improvement
@ 2025-09-26  8:40 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-09-26  8:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: oe-lkp, lkp, Paolo Abeni, Willem de Bruijn, David Ahern,
	Kuniyuki Iwashima, Jakub Kicinski, netdev, oliver.sang



Hello,

kernel test robot noticed a 200.0% improvement of netperf.Throughput_Mbps on:


commit: 6471658dc66c670580a7616e75f51b52917e7883 ("udp: use skb_attempt_defer_free()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


testcase: netperf
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 50%
	cluster: cs-localhost
	test: UDP_STREAM
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250926/202509261609.dec14b91-lkp@intel.com

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
  cs-localhost/gcc-14/performance/ipv4/x86_64-rhel-9.4/50%/debian-13-x86_64-20250902.cgz/300s/lkp-srf-2sp3/UDP_STREAM/netperf

commit: 
  3cd04c8f4a ("udp: make busylock per socket")
  6471658dc6 ("udp: use skb_attempt_defer_free()")

3cd04c8f4afed71a 6471658dc66c670580a7616e75f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 6.079e+09 ±  4%     +47.8%  8.983e+09        cpuidle..time
 4.012e+08          +320.9%  1.689e+09 ±  2%  cpuidle..usage
   9360404           +22.3%   11449805 ±  4%  numa-meminfo.node1.Active
   9360396           +22.3%   11449799 ±  4%  numa-meminfo.node1.Active(anon)
   8894257 ±  3%     +22.2%   10867440 ±  3%  numa-meminfo.node1.Shmem
 1.044e+09 ±  3%    +206.8%  3.203e+09        numa-numastat.node0.local_node
 1.044e+09 ±  3%    +206.8%  3.204e+09        numa-numastat.node0.numa_hit
 1.013e+09 ±  2%    +218.0%  3.221e+09        numa-numastat.node1.local_node
 1.013e+09 ±  2%    +218.0%  3.221e+09        numa-numastat.node1.numa_hit
      9.93 ±  5%      +4.4       14.28        mpstat.cpu.all.idle%
      0.59            +1.3        1.89        mpstat.cpu.all.irq%
      1.77           +12.4       14.15 ±  2%  mpstat.cpu.all.soft%
     86.78           -18.8       67.94        mpstat.cpu.all.sys%
      0.93            +0.8        1.74        mpstat.cpu.all.usr%
 1.044e+09 ±  3%    +206.9%  3.204e+09        numa-vmstat.node0.numa_hit
 1.044e+09 ±  3%    +206.9%  3.203e+09        numa-vmstat.node0.numa_local
   2339319           +22.4%    2863299 ±  4%  numa-vmstat.node1.nr_active_anon
   2222754 ±  3%     +22.3%    2717697 ±  3%  numa-vmstat.node1.nr_shmem
   2339318           +22.4%    2863297 ±  4%  numa-vmstat.node1.nr_zone_active_anon
 1.013e+09 ±  2%    +218.1%  3.221e+09        numa-vmstat.node1.numa_hit
 1.013e+09 ±  2%    +218.1%  3.221e+09        numa-vmstat.node1.numa_local
   9763805 ±  3%     +21.6%   11869079 ±  3%  meminfo.Active
   9763788 ±  3%     +21.6%   11869062 ±  3%  meminfo.Active(anon)
    805138           +15.5%     929863        meminfo.AnonPages
  12584871 ±  2%     +15.7%   14565534 ±  2%  meminfo.Cached
   9930753 ±  3%     +21.2%   12038335 ±  3%  meminfo.Committed_AS
  16167194           +14.9%   18577687 ±  2%  meminfo.Memused
   8962742 ±  3%     +22.1%   10943457 ±  3%  meminfo.Shmem
  16392623           +14.4%   18753189 ±  2%  meminfo.max_used_kB
     38913          +200.8%     117050        netperf.ThroughputBoth_Mbps
   3735655          +200.8%   11236826        netperf.ThroughputBoth_total_Mbps
     18515          +201.7%      55862        netperf.ThroughputRecv_Mbps
   1777441          +201.7%    5362763        netperf.ThroughputRecv_total_Mbps
     20398          +200.0%      61188        netperf.Throughput_Mbps
   1958214          +200.0%    5874063        netperf.Throughput_total_Mbps
  88004812 ±  8%     -78.3%   19110782 ± 18%  netperf.time.involuntary_context_switches
     41333           +18.4%      48917        netperf.time.minor_page_faults
      9067           -24.1%       6883        netperf.time.percent_of_cpu_this_job_got
     27208           -25.1%      20391        netperf.time.system_time
    158.64          +115.0%     341.02        netperf.time.user_time
 2.139e+09          +200.8%  6.433e+09        netperf.workload
   2441370 ±  3%     +21.6%    2967589 ±  3%  proc-vmstat.nr_active_anon
    201274           +15.5%     232430        proc-vmstat.nr_anon_pages
   6049392            -1.0%    5989283        proc-vmstat.nr_dirty_background_threshold
  12113577            -1.0%   11993210        proc-vmstat.nr_dirty_threshold
   3146655 ±  2%     +15.7%    3641743 ±  2%  proc-vmstat.nr_file_pages
  60865222            -1.0%   60263229        proc-vmstat.nr_free_pages
  60728673            -0.9%   60156480        proc-vmstat.nr_free_pages_blocks
   2241122 ±  3%     +22.1%    2736223 ±  3%  proc-vmstat.nr_shmem
     43088            +2.6%      44210        proc-vmstat.nr_slab_reclaimable
   2441370 ±  3%     +21.6%    2967589 ±  3%  proc-vmstat.nr_zone_active_anon
     46931 ± 26%   +1165.5%     593929 ± 21%  proc-vmstat.numa_hint_faults
     35701 ± 34%   +1506.1%     573389 ± 22%  proc-vmstat.numa_hint_faults_local
 2.057e+09          +212.3%  6.425e+09        proc-vmstat.numa_hit
 2.057e+09          +212.3%  6.424e+09        proc-vmstat.numa_local
     10954 ±  3%     +77.0%      19391 ±  2%  proc-vmstat.numa_pages_migrated
     95835 ± 35%    +588.2%     659561 ± 20%  proc-vmstat.numa_pte_updates
 1.641e+10          +212.8%  5.132e+10        proc-vmstat.pgalloc_normal
   1186751           +45.6%    1727586 ±  7%  proc-vmstat.pgfault
 1.641e+10          +212.8%  5.132e+10        proc-vmstat.pgfree
     10954 ±  3%     +77.0%      19391 ±  2%  proc-vmstat.pgmigrate_success
     48468            +7.4%      52040        proc-vmstat.pgreuse
 1.689e+10          +108.1%  3.514e+10        perf-stat.i.branch-instructions
  38251661          +111.9%   81050915        perf-stat.i.branch-misses
      0.92 ± 45%      +3.6        4.57 ± 41%  perf-stat.i.cache-miss-rate%
   5.1e+09           -84.4%  7.944e+08 ± 13%  perf-stat.i.cache-references
   3325973 ±  2%    +251.7%   11696843 ±  2%  perf-stat.i.context-switches
      6.70           -56.4%       2.92        perf-stat.i.cpi
 5.565e+11            -3.1%  5.393e+11        perf-stat.i.cpu-cycles
      2315 ± 14%   +1249.5%      31253 ±  9%  perf-stat.i.cpu-migrations
 8.352e+10          +121.5%   1.85e+11        perf-stat.i.instructions
      0.15          +124.3%       0.34        perf-stat.i.ipc
     17.32 ±  2%    +251.7%      60.92 ±  2%  perf-stat.i.metric.K/sec
      3553           +50.1%       5334 ±  8%  perf-stat.i.minor-faults
      3553           +50.1%       5334 ±  8%  perf-stat.i.page-faults
      0.62 ± 68%      +3.8        4.40 ± 41%  perf-stat.overall.cache-miss-rate%
      6.66           -56.2%       2.92        perf-stat.overall.cpi
      0.15          +128.5%       0.34        perf-stat.overall.ipc
     11795           -26.6%       8659        perf-stat.overall.path-length
 1.683e+10          +108.1%  3.502e+10        perf-stat.ps.branch-instructions
  38128826          +111.9%   80783216        perf-stat.ps.branch-misses
 5.083e+09           -84.4%  7.918e+08 ± 13%  perf-stat.ps.cache-references
   3315367 ±  2%    +251.7%   11658619 ±  2%  perf-stat.ps.context-switches
 5.547e+11            -3.1%  5.375e+11        perf-stat.ps.cpu-cycles
      2310 ± 14%   +1247.8%      31146 ±  9%  perf-stat.ps.cpu-migrations
 8.326e+10          +121.4%  1.844e+11        perf-stat.ps.instructions
      3534           +50.0%       5302 ±  8%  perf-stat.ps.minor-faults
      3534           +50.0%       5302 ±  8%  perf-stat.ps.page-faults
 2.523e+13          +120.8%   5.57e+13        perf-stat.total.instructions
  26243720           -28.6%   18749089        sched_debug.cfs_rq:/.avg_vruntime.avg
  28003592           -25.9%   20752934 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.max
  25141866           -32.7%   16920920        sched_debug.cfs_rq:/.avg_vruntime.min
      0.30 ±  4%     +26.3%       0.38 ±  2%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.30 ±  4%     +28.8%       0.38 ±  2%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
    209424 ± 34%    +169.4%     564215 ± 12%  sched_debug.cfs_rq:/.left_deadline.avg
   2071516 ± 26%     +50.9%    3126519 ±  6%  sched_debug.cfs_rq:/.left_deadline.stddev
    209420 ± 34%    +169.4%     564200 ± 12%  sched_debug.cfs_rq:/.left_vruntime.avg
   2071477 ± 26%     +50.9%    3126440 ±  6%  sched_debug.cfs_rq:/.left_vruntime.stddev
  26243720           -28.6%   18749089        sched_debug.cfs_rq:/.min_vruntime.avg
  28003592           -25.9%   20752934 ±  4%  sched_debug.cfs_rq:/.min_vruntime.max
  25141866           -32.7%   16920920        sched_debug.cfs_rq:/.min_vruntime.min
      0.27 ±  3%     +24.1%       0.33 ±  2%  sched_debug.cfs_rq:/.nr_queued.stddev
    209420 ± 34%    +169.4%     564200 ± 12%  sched_debug.cfs_rq:/.right_vruntime.avg
   2071477 ± 26%     +50.9%    3126440 ±  6%  sched_debug.cfs_rq:/.right_vruntime.stddev
    209.56 ±  6%     +32.0%     276.63 ±  2%  sched_debug.cfs_rq:/.runnable_avg.stddev
    192.89 ±  5%     +31.7%     253.94 ±  2%  sched_debug.cfs_rq:/.util_avg.stddev
    819031 ±  3%     -65.5%     282924 ±  8%  sched_debug.cpu.avg_idle.avg
      4756 ±  2%     -37.0%       2995 ±  2%  sched_debug.cpu.avg_idle.min
   1084177           -55.0%     487973 ± 11%  sched_debug.cpu.avg_idle.stddev
    740.98 ± 24%     +55.0%       1148 ± 24%  sched_debug.cpu.clock_task.stddev
      2338 ±  5%     +22.8%       2872        sched_debug.cpu.curr->pid.stddev
    742501 ± 19%     +46.3%    1085919 ±  5%  sched_debug.cpu.max_idle_balance_cost.min
    106154 ± 12%     -44.8%      58635 ±  9%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ± 17%     -20.6%       0.00 ±  3%  sched_debug.cpu.next_balance.stddev
      0.31 ±  5%     +24.8%       0.39        sched_debug.cpu.nr_running.stddev
   2579918 ±  2%    +252.5%    9093241 ±  2%  sched_debug.cpu.nr_switches.avg
   3594985          +181.0%   10103415 ±  4%  sched_debug.cpu.nr_switches.max
   1042225 ± 52%    +546.5%    6737845 ± 14%  sched_debug.cpu.nr_switches.min
    135.08 ± 23%     -24.8%     101.56 ± 18%  sched_debug.cpu.nr_uninterruptible.max




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-09-26  8:40 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-26  8:40 [linux-next:master] [udp] 6471658dc6: netperf.Throughput_Mbps 200.0% improvement kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.