[linus:master] [net] 16c610162d: netperf.Throughput

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [linus:master] [net]  16c610162d:  netperf.Throughput_tps 17.2% regression
@ 2025-10-28  6:25 kernel test robot
  2025-10-28  6:57 ` Eric Dumazet
  0 siblings, 1 reply; 2+ messages in thread
From: kernel test robot @ 2025-10-28  6:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: oe-lkp, lkp, linux-kernel, Jakub Kicinski, Kuniyuki Iwashima,
	netdev, oliver.sang



Hello,

kernel test robot noticed a 17.2% regression of netperf.Throughput_tps on:


commit: 16c610162d1f1c332209de1c91ffb09b659bb65d ("net: call cond_resched() less often in __release_sock()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[still regression on      linus/master dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa]
[still regression on linux-next/master 8fec172c82c2b5f6f8e47ab837c1dc91ee3d1b87]

testcase: netperf
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 200%
	cluster: cs-localhost
	test: TCP_CRR
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202510281337.398a9aa9-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251028/202510281337.398a9aa9-lkp@intel.com

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
  cs-localhost/gcc-14/performance/ipv4/x86_64-rhel-9.4/200%/debian-13-x86_64-20250902.cgz/300s/lkp-srf-2sp3/TCP_CRR/netperf

commit: 
  abfa70b380 ("Merge branch 'tcp-__tcp_close-changes'")
  16c610162d ("net: call cond_resched() less often in __release_sock()")

abfa70b380348cf4 16c610162d1f1c332209de1c91f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.80            -0.4        2.43 ±  3%  mpstat.cpu.all.usr%
    199581 ± 96%     -75.4%      49072 ± 64%  numa-meminfo.node0.Mapped
   6583442 ±  6%     -30.2%    4594175 ±  5%  numa-numastat.node0.local_node
   6709344 ±  6%     -30.4%    4672973 ±  5%  numa-numastat.node0.numa_hit
     50277 ± 96%     -75.4%      12383 ± 63%  numa-vmstat.node0.nr_mapped
   6708267 ±  6%     -30.3%    4672365 ±  5%  numa-vmstat.node0.numa_hit
   6582364 ±  6%     -30.2%    4593568 ±  5%  numa-vmstat.node0.numa_local
    224.83 ±100%    +224.8%     730.17 ± 36%  perf-c2c.DRAM.local
      1438 ±100%    +132.4%       3343 ± 11%  perf-c2c.DRAM.remote
      1569 ±100%    +115.5%       3383 ± 10%  perf-c2c.HITM.local
      1089 ±100%    +121.1%       2408 ± 10%  perf-c2c.HITM.remote
  14776381 ±  9%     -21.6%   11587148 ±  8%  proc-vmstat.numa_hit
  14576750 ±  9%     -21.9%   11387471 ±  8%  proc-vmstat.numa_local
  51492399 ±  6%     -26.1%   38054262 ±  5%  proc-vmstat.pgalloc_normal
  48277971 ±  5%     -26.9%   35310227 ±  5%  proc-vmstat.pgfree
   2874230           -17.2%    2379822        netperf.ThroughputBoth_total_tps
      7484           -17.2%       6197        netperf.ThroughputBoth_tps
   2874230           -17.2%    2379822        netperf.Throughput_total_tps
      7484           -17.2%       6197        netperf.Throughput_tps
 1.351e+09           -13.7%  1.165e+09        netperf.time.involuntary_context_switches
      9145            +7.8%       9855        netperf.time.percent_of_cpu_this_job_got
     27055            +8.4%      29322        netperf.time.system_time
    927.87           -11.1%     824.49        netperf.time.user_time
 1.975e+08 ±  5%     -28.2%  1.418e+08 ±  6%  netperf.time.voluntary_context_switches
 8.623e+08           -17.2%  7.139e+08        netperf.workload
   7908218 ±  8%     +33.3%   10540980 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      2.27           -10.2%       2.04        sched_debug.cfs_rq:/.h_nr_queued.avg
     11.92 ±  7%     -18.9%       9.67 ±  8%  sched_debug.cfs_rq:/.h_nr_queued.max
      2.33 ±  5%     -13.6%       2.02 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      5.14 ± 27%     -50.8%       2.53 ± 51%  sched_debug.cfs_rq:/.load_avg.min
   7908224 ±  8%     +33.3%   10540996 ±  7%  sched_debug.cfs_rq:/.min_vruntime.stddev
    245718 ±  4%     -10.4%     220184 ±  8%  sched_debug.cpu.max_idle_balance_cost.stddev
      2.26           -10.2%       2.03        sched_debug.cpu.nr_running.avg
      2.33 ±  5%     -13.8%       2.01 ±  4%  sched_debug.cpu.nr_running.stddev
   8021905           -16.0%    6738879        sched_debug.cpu.nr_switches.avg
  10163286           -20.5%    8082726 ±  2%  sched_debug.cpu.nr_switches.max
   1494738 ± 14%     -50.1%     745542 ±  9%  sched_debug.cpu.nr_switches.stddev
 6.417e+10           -16.1%  5.383e+10        perf-stat.i.branch-instructions
      0.52            -0.0        0.49        perf-stat.i.branch-miss-rate%
 3.329e+08           -21.1%  2.628e+08        perf-stat.i.branch-misses
  49601635 ±  8%     -15.1%   42090142 ±  6%  perf-stat.i.cache-misses
 2.238e+08           -11.6%  1.979e+08 ±  2%  perf-stat.i.cache-references
  10160912           -15.7%    8567209        perf-stat.i.context-switches
      1.74           +20.0%       2.09        perf-stat.i.cpi
      2679 ±  7%     -22.9%       2067 ±  3%  perf-stat.i.cpu-migrations
     12544 ±  7%     +17.2%      14707 ±  5%  perf-stat.i.cycles-between-cache-misses
 3.464e+11           -16.3%  2.898e+11        perf-stat.i.instructions
      0.58           -16.4%       0.49        perf-stat.i.ipc
     52.92           -15.7%      44.62        perf-stat.i.metric.K/sec
      0.52            -0.0        0.49        perf-stat.overall.branch-miss-rate%
      1.74           +19.4%       2.07        perf-stat.overall.cpi
     12209 ±  8%     +17.3%      14320 ±  6%  perf-stat.overall.cycles-between-cache-misses
      0.58           -16.3%       0.48        perf-stat.overall.ipc
    122980            +1.1%     124361        perf-stat.overall.path-length
 6.398e+10           -16.1%  5.367e+10        perf-stat.ps.branch-instructions
 3.319e+08           -21.1%   2.62e+08        perf-stat.ps.branch-misses
  49465671 ±  8%     -15.1%   41971976 ±  6%  perf-stat.ps.cache-misses
 2.231e+08           -11.6%  1.973e+08 ±  2%  perf-stat.ps.cache-references
  10129507           -15.7%    8540638        perf-stat.ps.context-switches
      2669 ±  7%     -22.8%       2061 ±  3%  perf-stat.ps.cpu-migrations
 3.454e+11           -16.3%   2.89e+11        perf-stat.ps.instructions
  1.06e+14           -16.3%  8.879e+13        perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [linus:master] [net] 16c610162d: netperf.Throughput_tps 17.2% regression
  2025-10-28  6:25 [linus:master] [net] 16c610162d: netperf.Throughput_tps 17.2% regression kernel test robot
@ 2025-10-28  6:57 ` Eric Dumazet
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Dumazet @ 2025-10-28  6:57 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Jakub Kicinski, Kuniyuki Iwashima,
	netdev

On Mon, Oct 27, 2025 at 11:26 PM kernel test robot
<oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a 17.2% regression of netperf.Throughput_tps on:
>
>
> commit: 16c610162d1f1c332209de1c91ffb09b659bb65d ("net: call cond_resched() less often in __release_sock()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [still regression on      linus/master dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa]
> [still regression on linux-next/master 8fec172c82c2b5f6f8e47ab837c1dc91ee3d1b87]
>
> testcase: netperf
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
> parameters:
>
>         ip: ipv4
>         runtime: 300s
>         nr_threads: 200%
>         cluster: cs-localhost
>         test: TCP_CRR
>         cpufreq_governor: performance
>
>
>

I will not consider this as a regression.

If anyone is interested, they would have to investigate if TCP_CRR on
localhost is
a really interesting metric, and why this would depend  on
cond_resched() in __release_sock()

Thank you.

>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202510281337.398a9aa9-lkp@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20251028/202510281337.398a9aa9-lkp@intel.com
>
> =========================================================================================
> cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
>   cs-localhost/gcc-14/performance/ipv4/x86_64-rhel-9.4/200%/debian-13-x86_64-20250902.cgz/300s/lkp-srf-2sp3/TCP_CRR/netperf
>
> commit:
>   abfa70b380 ("Merge branch 'tcp-__tcp_close-changes'")
>   16c610162d ("net: call cond_resched() less often in __release_sock()")
>
> abfa70b380348cf4 16c610162d1f1c332209de1c91f
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>       2.80            -0.4        2.43 ą  3%  mpstat.cpu.all.usr%
>     199581 ą 96%     -75.4%      49072 ą 64%  numa-meminfo.node0.Mapped
>    6583442 ą  6%     -30.2%    4594175 ą  5%  numa-numastat.node0.local_node
>    6709344 ą  6%     -30.4%    4672973 ą  5%  numa-numastat.node0.numa_hit
>      50277 ą 96%     -75.4%      12383 ą 63%  numa-vmstat.node0.nr_mapped
>    6708267 ą  6%     -30.3%    4672365 ą  5%  numa-vmstat.node0.numa_hit
>    6582364 ą  6%     -30.2%    4593568 ą  5%  numa-vmstat.node0.numa_local
>     224.83 ą100%    +224.8%     730.17 ą 36%  perf-c2c.DRAM.local
>       1438 ą100%    +132.4%       3343 ą 11%  perf-c2c.DRAM.remote
>       1569 ą100%    +115.5%       3383 ą 10%  perf-c2c.HITM.local
>       1089 ą100%    +121.1%       2408 ą 10%  perf-c2c.HITM.remote
>   14776381 ą  9%     -21.6%   11587148 ą  8%  proc-vmstat.numa_hit
>   14576750 ą  9%     -21.9%   11387471 ą  8%  proc-vmstat.numa_local
>   51492399 ą  6%     -26.1%   38054262 ą  5%  proc-vmstat.pgalloc_normal
>   48277971 ą  5%     -26.9%   35310227 ą  5%  proc-vmstat.pgfree
>    2874230           -17.2%    2379822        netperf.ThroughputBoth_total_tps
>       7484           -17.2%       6197        netperf.ThroughputBoth_tps
>    2874230           -17.2%    2379822        netperf.Throughput_total_tps
>       7484           -17.2%       6197        netperf.Throughput_tps
>  1.351e+09           -13.7%  1.165e+09        netperf.time.involuntary_context_switches
>       9145            +7.8%       9855        netperf.time.percent_of_cpu_this_job_got
>      27055            +8.4%      29322        netperf.time.system_time
>     927.87           -11.1%     824.49        netperf.time.user_time
>  1.975e+08 ą  5%     -28.2%  1.418e+08 ą  6%  netperf.time.voluntary_context_switches
>  8.623e+08           -17.2%  7.139e+08        netperf.workload
>    7908218 ą  8%     +33.3%   10540980 ą  7%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>       2.27           -10.2%       2.04        sched_debug.cfs_rq:/.h_nr_queued.avg
>      11.92 ą  7%     -18.9%       9.67 ą  8%  sched_debug.cfs_rq:/.h_nr_queued.max
>       2.33 ą  5%     -13.6%       2.02 ą  4%  sched_debug.cfs_rq:/.h_nr_queued.stddev
>       5.14 ą 27%     -50.8%       2.53 ą 51%  sched_debug.cfs_rq:/.load_avg.min
>    7908224 ą  8%     +33.3%   10540996 ą  7%  sched_debug.cfs_rq:/.min_vruntime.stddev
>     245718 ą  4%     -10.4%     220184 ą  8%  sched_debug.cpu.max_idle_balance_cost.stddev
>       2.26           -10.2%       2.03        sched_debug.cpu.nr_running.avg
>       2.33 ą  5%     -13.8%       2.01 ą  4%  sched_debug.cpu.nr_running.stddev
>    8021905           -16.0%    6738879        sched_debug.cpu.nr_switches.avg
>   10163286           -20.5%    8082726 ą  2%  sched_debug.cpu.nr_switches.max
>    1494738 ą 14%     -50.1%     745542 ą  9%  sched_debug.cpu.nr_switches.stddev
>  6.417e+10           -16.1%  5.383e+10        perf-stat.i.branch-instructions
>       0.52            -0.0        0.49        perf-stat.i.branch-miss-rate%
>  3.329e+08           -21.1%  2.628e+08        perf-stat.i.branch-misses
>   49601635 ą  8%     -15.1%   42090142 ą  6%  perf-stat.i.cache-misses
>  2.238e+08           -11.6%  1.979e+08 ą  2%  perf-stat.i.cache-references
>   10160912           -15.7%    8567209        perf-stat.i.context-switches
>       1.74           +20.0%       2.09        perf-stat.i.cpi
>       2679 ą  7%     -22.9%       2067 ą  3%  perf-stat.i.cpu-migrations
>      12544 ą  7%     +17.2%      14707 ą  5%  perf-stat.i.cycles-between-cache-misses
>  3.464e+11           -16.3%  2.898e+11        perf-stat.i.instructions
>       0.58           -16.4%       0.49        perf-stat.i.ipc
>      52.92           -15.7%      44.62        perf-stat.i.metric.K/sec
>       0.52            -0.0        0.49        perf-stat.overall.branch-miss-rate%
>       1.74           +19.4%       2.07        perf-stat.overall.cpi
>      12209 ą  8%     +17.3%      14320 ą  6%  perf-stat.overall.cycles-between-cache-misses
>       0.58           -16.3%       0.48        perf-stat.overall.ipc
>     122980            +1.1%     124361        perf-stat.overall.path-length
>  6.398e+10           -16.1%  5.367e+10        perf-stat.ps.branch-instructions
>  3.319e+08           -21.1%   2.62e+08        perf-stat.ps.branch-misses
>   49465671 ą  8%     -15.1%   41971976 ą  6%  perf-stat.ps.cache-misses
>  2.231e+08           -11.6%  1.973e+08 ą  2%  perf-stat.ps.cache-references
>   10129507           -15.7%    8540638        perf-stat.ps.context-switches
>       2669 ą  7%     -22.8%       2061 ą  3%  perf-stat.ps.cpu-migrations
>  3.454e+11           -16.3%   2.89e+11        perf-stat.ps.instructions
>   1.06e+14           -16.3%  8.879e+13        perf-stat.total.instructions
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-10-28  6:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-28  6:25 [linus:master] [net] 16c610162d: netperf.Throughput_tps 17.2% regression kernel test robot
2025-10-28  6:57 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).