All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Fernand Sieber <sieberf@amazon.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>, <aubrey.li@linux.intel.com>,
	<yu.c.chen@intel.com>, <peterz@infradead.org>,
	<bsegall@google.com>, <dietmar.eggemann@arm.com>,
	<dwmw@amazon.co.uk>, <graf@amazon.com>, <jschoenh@amazon.de>,
	<juri.lelli@redhat.com>, <mingo@redhat.com>, <sieberf@amazon.com>,
	<tanghui20@huawei.com>, <vincent.guittot@linaro.org>,
	<vineethr@linux.ibm.com>, <wangtao554@huawei.com>,
	<zhangqiao22@huawei.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH v3] sched/fair: Forfeit vruntime on yield
Date: Fri, 26 Sep 2025 12:56:49 +0800	[thread overview]
Message-ID: <202509261113.a87577ce-lkp@intel.com> (raw)
In-Reply-To: <20250918150528.292620-1-sieberf@amazon.com>


Hello,


we reported "a 55.9% improvement of stress-ng.wait.ops_per_sec"
in https://lore.kernel.org/all/202509241501.f14b210a-lkp@intel.com/

now we noticed there is also a regression in our tests. report again FYI.

one thing we want to mention is the "stress-ng.sockpair.MB_written_per_sec" is
in "miscellaneous metrics" of this stress-ng test. for major part,
"stress-ng.sockpair.ops_per_sec", it's just a small difference.

0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    551.38           -90.5%      52.18        stress-ng.sockpair.MB_written_per_sec
    781743            -2.3%     764106        stress-ng.sockpair.ops_per_sec


below is a test example for 15bf8c7b35:

2025-09-25 15:48:21 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --oom-avoid --sockpair 192
stress-ng: info:  [8371] setting to a 1 min run per stressor
stress-ng: info:  [8371] dispatching hogs: 192 sockpair
stress-ng: info:  [8371] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
stress-ng: metrc: [8371] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [8371]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [8371] sockpair       49874197     65.44     72.08  12219.54    762108.28        4057.58        97.82          3132
stress-ng: metrc: [8371] miscellaneous metrics:
stress-ng: metrc: [8371] sockpair           27717.04 socketpair calls sec (harmonic mean of 192 instances)
stress-ng: metrc: [8371] sockpair              53.01 MB written per sec (harmonic mean of 192 instances)
stress-ng: info:  [8371] for a 66.13s run time:
stress-ng: info:  [8371]   12696.46s available CPU time
stress-ng: info:  [8371]      72.07s user time   (  0.57%)
stress-ng: info:  [8371]   12219.63s system time ( 96.24%)
stress-ng: info:  [8371]   12291.70s total time  ( 96.81%)
stress-ng: info:  [8371] load average: 190.99 57.46 19.94
stress-ng: info:  [8371] skipped: 0
stress-ng: info:  [8371] passed: 192: sockpair (192)
stress-ng: info:  [8371] failed: 0
stress-ng: info:  [8371] metrics untrustworthy: 0
stress-ng: info:  [8371] successful run completed in 1 min, 6.13 secs


below is an exmple from 0d4eaf8caf:

2025-09-25 18:04:37 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --oom-avoid --sockpair 192
stress-ng: info:  [8360] setting to a 1 min run per stressor
stress-ng: info:  [8360] dispatching hogs: 192 sockpair
stress-ng: info:  [8360] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
stress-ng: metrc: [8360] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [8360]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [8360] sockpair       51705787     65.08     56.75  12254.39    794448.25        4199.92        98.52          5160
stress-ng: metrc: [8360] miscellaneous metrics:
stress-ng: metrc: [8360] sockpair           28156.62 socketpair calls sec (harmonic mean of 192 instances)
stress-ng: metrc: [8360] sockpair             562.18 MB written per sec (harmonic mean of 192 instances)
stress-ng: info:  [8360] for a 65.40s run time:
stress-ng: info:  [8360]   12556.08s available CPU time
stress-ng: info:  [8360]      56.75s user time   (  0.45%)
stress-ng: info:  [8360]   12254.48s system time ( 97.60%)
stress-ng: info:  [8360]   12311.23s total time  ( 98.05%)
stress-ng: info:  [8360] load average: 239.81 72.31 25.10
stress-ng: info:  [8360] skipped: 0
stress-ng: info:  [8360] passed: 192: sockpair (192)
stress-ng: info:  [8360] failed: 0
stress-ng: info:  [8360] metrics untrustworthy: 0
stress-ng: info:  [8360] successful run completed in 1 min, 5.40 secs


below is full report.


kernel test robot noticed a 90.5% regression of stress-ng.sockpair.MB_written_per_sec on:


commit: 15bf8c7b35e31295b26241425c0a61102e92109f ("[PATCH v3] sched/fair: Forfeit vruntime on yield")
url: https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-fair-Forfeit-vruntime-on-yield/20250918-231320
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 0d4eaf8caf8cd633b23e949e2996b420052c2d45
patch link: https://lore.kernel.org/all/20250918150528.292620-1-sieberf@amazon.com/
patch subject: [PATCH v3] sched/fair: Forfeit vruntime on yield

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sockpair
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202509261113.a87577ce-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250926/202509261113.a87577ce-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/sockpair/stress-ng/60s

commit: 
  0d4eaf8caf ("sched/fair: Do not balance task to a throttled cfs_rq")
  15bf8c7b35 ("sched/fair: Forfeit vruntime on yield")

0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.78 ±  2%      +0.2        1.02        mpstat.cpu.all.usr%
     19.57           -36.8%      12.36 ± 70%  turbostat.RAMWatt
 4.073e+08 ±  6%     +23.1%  5.013e+08 ±  5%  cpuidle..time
    266261 ±  9%     +46.4%     389733 ±  9%  cpuidle..usage
    451887 ± 77%    +160.9%    1178929 ± 33%  numa-vmstat.node0.nr_file_pages
    192819 ± 30%    +101.3%     388191 ± 43%  numa-vmstat.node1.nr_shmem
   1807416 ± 77%    +161.0%    4716665 ± 33%  numa-meminfo.node0.FilePages
   8980121            -9.0%    8174177        numa-meminfo.node0.SUnreclaim
  25356157 ±  8%     -22.0%   19772595 ±  9%  numa-meminfo.node1.MemUsed
    771480 ± 30%    +101.4%    1553932 ± 43%  numa-meminfo.node1.Shmem
    551.38           -90.5%      52.18        stress-ng.sockpair.MB_written_per_sec
  51092272            -2.2%   49968621        stress-ng.sockpair.ops
    781743            -2.3%     764106        stress-ng.sockpair.ops_per_sec
  21418332 ±  4%     +69.2%   36232510        stress-ng.time.involuntary_context_switches
     56.36           +27.4%      71.81        stress-ng.time.user_time
    150809 ± 21%  +17217.1%   26115838 ±  3%  stress-ng.time.voluntary_context_switches
   2165914 ±  7%     +92.3%    4165197 ±  4%  meminfo.Active
   2165898 ±  7%     +92.3%    4165181 ±  4%  meminfo.Active(anon)
   4926568           +39.6%    6875228        meminfo.Cached
   6826363           +28.1%    8744371        meminfo.Committed_AS
    513281 ±  8%     +98.7%    1019681 ±  6%  meminfo.Mapped
  48472806 ±  2%     -14.8%   41314088        meminfo.Memused
   1276164          +152.7%    3224818 ±  3%  meminfo.Shmem
  53022761 ±  2%     -15.7%   44672632        meminfo.max_used_kB
      0.53           -81.0%       0.10 ±  4%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
      0.53           -81.0%       0.10 ±  4%  perf-sched.total_sch_delay.average.ms
      2.03           -68.4%       0.64 ±  4%  perf-sched.total_wait_and_delay.average.ms
   1811449          +200.9%    5449776 ±  4%  perf-sched.total_wait_and_delay.count.ms
      1.50           -64.0%       0.54 ±  4%  perf-sched.total_wait_time.average.ms
      2.03           -68.4%       0.64 ±  4%  perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
   1811449          +200.9%    5449776 ±  4%  perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
      1.50           -64.0%       0.54 ±  4%  perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
    541937 ±  7%     +92.5%    1043389 ±  4%  proc-vmstat.nr_active_anon
   5242293            +3.5%    5423918        proc-vmstat.nr_dirty_background_threshold
  10497404            +3.5%   10861099        proc-vmstat.nr_dirty_threshold
   1232280           +39.7%    1721251        proc-vmstat.nr_file_pages
  52782357            +3.4%   54601330        proc-vmstat.nr_free_pages
  52117733            +3.8%   54073313        proc-vmstat.nr_free_pages_blocks
    128259 ±  8%    +100.8%     257594 ±  6%  proc-vmstat.nr_mapped
    319681          +153.0%     808650 ±  3%  proc-vmstat.nr_shmem
   4489133            -8.9%    4089704        proc-vmstat.nr_slab_unreclaimable
    541937 ±  7%     +92.5%    1043389 ±  4%  proc-vmstat.nr_zone_active_anon
  77303955            +2.5%   79201972        proc-vmstat.pgalloc_normal
    519724            +5.2%     546556        proc-vmstat.pgfault
  76456707            +1.7%   77739095        proc-vmstat.pgfree
  12794131 ±  6%     -27.4%    9288185        sched_debug.cfs_rq:/.avg_vruntime.max
   4610143 ±  8%     -14.9%    3923890 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.min
      1.03           -20.1%       0.83 ±  2%  sched_debug.cfs_rq:/.h_nr_queued.avg
      1.03           -20.8%       0.82 ±  2%  sched_debug.cfs_rq:/.h_nr_runnable.avg
    895.00 ± 70%     +89.0%       1691 ±  2%  sched_debug.cfs_rq:/.load.min
      0.67 ± 55%    +125.0%       1.50        sched_debug.cfs_rq:/.load_avg.min
  12794131 ±  6%     -27.4%    9288185        sched_debug.cfs_rq:/.min_vruntime.max
   4610143 ±  8%     -14.9%    3923896 ±  5%  sched_debug.cfs_rq:/.min_vruntime.min
      1103           -20.2%     880.86        sched_debug.cfs_rq:/.runnable_avg.avg
    428.26 ±  6%     -63.4%     156.94 ± 22%  sched_debug.cfs_rq:/.util_est.avg
      1775 ±  6%     -39.3%       1077 ± 15%  sched_debug.cfs_rq:/.util_est.max
    396.33 ±  6%     -50.0%     198.03 ± 17%  sched_debug.cfs_rq:/.util_est.stddev
     50422 ±  6%     -34.7%      32915 ± 18%  sched_debug.cpu.avg_idle.min
    456725 ± 10%     +39.4%     636811 ±  4%  sched_debug.cpu.avg_idle.stddev
    611566 ±  5%     +25.0%     764424 ±  2%  sched_debug.cpu.max_idle_balance_cost.avg
    190657 ± 12%     +36.1%     259410 ±  5%  sched_debug.cpu.max_idle_balance_cost.stddev
      1.04           -20.4%       0.82 ±  2%  sched_debug.cpu.nr_running.avg
     57214 ±  4%    +183.5%     162228 ±  2%  sched_debug.cpu.nr_switches.avg
    253314 ±  4%     +39.3%     352777 ±  4%  sched_debug.cpu.nr_switches.max
     59410 ±  6%     +31.6%      78186 ± 10%  sched_debug.cpu.nr_switches.stddev
      3.33           -27.9%       2.40        perf-stat.i.MPKI
 1.207e+10           +11.3%  1.344e+10        perf-stat.i.branch-instructions
      0.21 ±  7%      +0.0        0.24 ±  5%  perf-stat.i.branch-miss-rate%
  23462655 ±  6%     +27.4%   29896517 ±  3%  perf-stat.i.branch-misses
     75.74            -4.4       71.33        perf-stat.i.cache-miss-rate%
 1.861e+08           -21.5%  1.462e+08        perf-stat.i.cache-misses
 2.435e+08           -17.1%  2.017e+08        perf-stat.i.cache-references
    323065 ±  5%    +191.4%     941425 ±  2%  perf-stat.i.context-switches
     10.73            -9.7%       9.69        perf-stat.i.cpi
    353.45           +39.0%     491.13 ±  4%  perf-stat.i.cpu-migrations
      3589           +30.5%       4685        perf-stat.i.cycles-between-cache-misses
 5.645e+10           +12.0%  6.323e+10        perf-stat.i.instructions
      0.09           +12.1%       0.11        perf-stat.i.ipc
      1.66 ±  5%    +193.9%       4.89 ±  2%  perf-stat.i.metric.K/sec
      6247            +5.7%       6603 ±  2%  perf-stat.i.minor-faults
      6248            +5.7%       6604 ±  2%  perf-stat.i.page-faults
      3.33           -29.7%       2.34        perf-stat.overall.MPKI
      0.20 ±  7%      +0.0        0.23 ±  4%  perf-stat.overall.branch-miss-rate%
     76.67            -3.9       72.79        perf-stat.overall.cache-miss-rate%
     10.54           -11.1%       9.37        perf-stat.overall.cpi
      3168           +26.5%       4007        perf-stat.overall.cycles-between-cache-misses
      0.09           +12.5%       0.11        perf-stat.overall.ipc
 1.204e+10           +11.1%  1.337e+10        perf-stat.ps.branch-instructions
  23586580 ±  7%     +29.7%   30600100 ±  4%  perf-stat.ps.branch-misses
 1.873e+08           -21.4%  1.471e+08        perf-stat.ps.cache-misses
 2.443e+08           -17.3%  2.021e+08        perf-stat.ps.cache-references
    324828 ±  5%    +187.0%     932274 ±  2%  perf-stat.ps.context-switches
    335.13 ±  2%     +41.7%     474.95 ±  5%  perf-stat.ps.cpu-migrations
 5.632e+10           +11.7%  6.293e+10        perf-stat.ps.instructions
      6282            +6.5%       6690 ±  2%  perf-stat.ps.minor-faults
      6284            +6.5%       6692 ±  2%  perf-stat.ps.page-faults
 3.764e+12           +12.2%  4.224e+12        perf-stat.total.instructions



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  parent reply	other threads:[~2025-09-26  4:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11  9:51 [PATCH RESEND] sched/fair: Only increment deadline once on yield Fernand Sieber
2025-09-11 11:03 ` Alexander Graf
2025-09-11 11:37 ` Peter Zijlstra
2025-09-11 13:56   ` Peter Zijlstra
2025-09-16 13:35   ` Fernand Sieber
2025-09-16 14:02 ` [PATCH v2] sched/fair: Forfeit vruntime " Fernand Sieber
2025-09-16 16:00   ` Fernand Sieber
2025-09-18  6:43     ` Peter Zijlstra
2025-09-18 10:21       ` Peter Zijlstra
2025-09-18 15:05         ` [PATCH v3] " Fernand Sieber
2025-09-24  8:25           ` kernel test robot
2025-09-26  4:56           ` kernel test robot [this message]
2025-10-16  9:33           ` [tip: sched/core] " tip-bot2 for Fernand Sieber
2025-11-05  9:13           ` [PATCH v4] " Fernand Sieber
2025-09-17 19:22   ` [PATCH v2] " Fernand Sieber
2025-09-18  2:45   ` Xuewen Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202509261113.a87577ce-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.com \
    --cc=jschoenh@amazon.de \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=mingo@redhat.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=sieberf@amazon.com \
    --cc=tanghui20@huawei.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vineethr@linux.ibm.com \
    --cc=wangtao554@huawei.com \
    --cc=yu.c.chen@intel.com \
    --cc=zhangqiao22@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.