[linux-next:master] [vmalloc] 60ced5818f: stress-ng.shm.ops_per

All of lore.kernel.org
 help / color / mirror / Atom feed

* [linux-next:master] [vmalloc]  60ced5818f: stress-ng.shm.ops_per_sec 7.2% regression
@ 2026-06-02 13:18 kernel test robot
  2026-06-09  7:35 ` Yeoreum Yun
  0 siblings, 1 reply; 5+ messages in thread
From: kernel test robot @ 2026-06-02 13:18 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: oe-lkp, lkp, Andrew Morton, Muhammad Usama Anjum, Vlastimil Babka,
	Zi Yan, David Hildenbrand, Uladzislau Rezki, Brendan Jackman,
	David Sterba, Johannes Weiner, Liam Howlett, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Nick Terrell, Suren Baghdasaryan,
	Vishal Moola, linux-mm, oliver.sang



Hello,

kernel test robot noticed a 7.2% regression of stress-ng.shm.ops_per_sec on:


commit: 60ced5818f64ac356620d1ad3e0d473c457dbf5b ("vmalloc: optimize vfree with free_pages_bulk()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

[still regression on linux-next/master 7da7f07112610a520567421dd2ffcb51beaefbcc]

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: shm
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202606022131.112319f2-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260602/202606022131.112319f2-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/shm/stress-ng/60s

commit: 
  4aa4abf1f1 ("mm/page_alloc: optimize free_contig_range()")
  60ced5818f ("vmalloc: optimize vfree with free_pages_bulk()")

4aa4abf1f14bd6d0 60ced5818f64ac356620d1ad3e0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1103082            -7.2%    1024016        stress-ng.shm.ops
     18394            -7.2%      17072        stress-ng.shm.ops_per_sec
   2084775           -23.1%    1602693 ±  3%  stress-ng.time.involuntary_context_switches
 3.076e+08            -7.3%  2.852e+08        stress-ng.time.minor_page_faults
     14759           -21.1%      11646        stress-ng.time.percent_of_cpu_this_job_got
      8806           -21.1%       6946        stress-ng.time.system_time
     86.08            -7.5%      79.66        stress-ng.time.user_time
   2799689            -1.3%    2762252        stress-ng.time.voluntary_context_switches
 1.125e+09 ±  6%     +42.3%  1.601e+09 ± 13%  cpuidle..time
   2089564 ±  3%     +25.6%    2624591 ±  2%  cpuidle..usage
    360.33           -22.5%     279.35 ± 44%  turbostat.PkgWatt
     36.35           -22.6%      28.12 ± 44%  turbostat.RAMWatt
    198.28 ±  2%      +9.2%     216.55        vmstat.procs.r
    131962            -6.3%     123686        vmstat.system.cs
  14039730 ±  2%     -18.1%   11503879 ± 18%  numa-meminfo.node0.MemUsed
    175943 ± 13%    +114.1%     376780 ±  2%  numa-meminfo.node0.PageTables
    184811 ± 13%    +105.7%     380127 ±  4%  numa-meminfo.node1.PageTables
      9.32 ±  5%      +3.2       12.57 ± 11%  mpstat.cpu.all.idle%
      3.69 ±  9%      +5.6        9.29 ±  4%  mpstat.cpu.all.soft%
     85.37            -8.6       76.73        mpstat.cpu.all.sys%
      1.38            -0.2        1.19 ±  3%  mpstat.cpu.all.usr%
 4.555e+08            -7.4%  4.216e+08        numa-numastat.node0.local_node
 4.557e+08            -7.5%  4.217e+08        numa-numastat.node0.numa_hit
 4.493e+08            -6.8%  4.187e+08        numa-numastat.node1.local_node
 4.494e+08            -6.8%  4.189e+08        numa-numastat.node1.numa_hit
    193547            +7.4%     207908 ±  3%  perf-stat.i.cpu-clock
    193547            +7.4%     207908 ±  3%  perf-stat.i.task-clock
    135837            -8.5%     124266 ±  2%  perf-stat.ps.context-switches
   5141774           -11.4%    4555341 ±  2%  perf-stat.ps.minor-faults
   5141776           -11.4%    4555343 ±  2%  perf-stat.ps.page-faults
    194174           +12.8%     219047        meminfo.KReclaimable
  24232028            -8.6%   22159275        meminfo.Memused
    354997 ± 14%    +111.6%     751255 ±  4%  meminfo.PageTables
    194174           +12.8%     219047        meminfo.SReclaimable
    563220           +16.5%     656030        meminfo.SUnreclaim
    757395           +15.5%     875078        meminfo.Slab
    350188           +11.4%     390033        meminfo.VmallocUsed
  26142507            -9.8%   23588567        meminfo.max_used_kB
     43483 ± 14%    +119.0%      95246 ±  2%  numa-vmstat.node0.nr_page_table_pages
     43257 ±  3%     +12.4%      48605        numa-vmstat.node0.nr_vmalloc
 4.557e+08            -7.5%  4.217e+08        numa-vmstat.node0.numa_hit
 4.555e+08            -7.4%  4.216e+08        numa-vmstat.node0.numa_local
     45890 ± 13%    +109.3%      96035 ±  4%  numa-vmstat.node1.nr_page_table_pages
     44555 ±  4%      +9.3%      48699        numa-vmstat.node1.nr_vmalloc
 4.494e+08            -6.8%  4.189e+08        numa-vmstat.node1.numa_hit
 4.493e+08            -6.8%  4.187e+08        numa-vmstat.node1.numa_local
      0.16 ± 16%    +707.7%       1.30 ± 44%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
    346.29 ± 85%   +1061.2%       4021 ± 38%  perf-sched.sch_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
      0.16 ± 16%    +707.7%       1.30 ± 44%  perf-sched.total_sch_delay.average.ms
    346.29 ± 85%   +1061.2%       4021 ± 38%  perf-sched.total_sch_delay.max.ms
      7.34           +58.2%      11.61 ± 13%  perf-sched.total_wait_and_delay.average.ms
      4478 ±  6%     +49.6%       6697 ± 20%  perf-sched.total_wait_and_delay.max.ms
      7.18           +43.6%      10.31 ± 10%  perf-sched.total_wait_time.average.ms
      4477 ±  5%     +29.7%       5809 ± 17%  perf-sched.total_wait_time.max.ms
      7.34           +58.2%      11.61 ± 13%  perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
      4478 ±  6%     +49.6%       6697 ± 20%  perf-sched.wait_and_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
      7.18           +43.6%      10.31 ± 10%  perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
      4477 ±  5%     +29.7%       5809 ± 17%  perf-sched.wait_time.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
   1975577            -4.0%    1896474        proc-vmstat.nr_active_anon
    973349            -2.6%     947803        proc-vmstat.nr_anon_pages
     46138            +3.1%      47546        proc-vmstat.nr_kernel_stack
     90193 ± 14%    +110.0%     189399 ±  4%  proc-vmstat.nr_page_table_pages
     48563           +12.8%      54769        proc-vmstat.nr_slab_reclaimable
    140817           +16.5%     164023        proc-vmstat.nr_slab_unreclaimable
     87646           +11.2%      97438        proc-vmstat.nr_vmalloc
   1975576            -4.0%    1896478        proc-vmstat.nr_zone_active_anon
 9.051e+08            -7.1%  8.406e+08        proc-vmstat.numa_hit
 9.048e+08            -7.1%  8.403e+08        proc-vmstat.numa_local
 9.069e+08            -7.1%  8.421e+08        proc-vmstat.pgalloc_normal
 3.538e+08            -7.5%  3.273e+08        proc-vmstat.pgfault
 9.061e+08            -7.1%  8.414e+08        proc-vmstat.pgfree
     29261           -10.3%      26241        sched_debug.cfs_rq:/.avg_vruntime.avg
      0.58 ±  5%     +13.8%       0.66 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.avg
      0.58 ±  5%     +13.4%       0.66 ±  4%  sched_debug.cfs_rq:/.h_nr_runnable.avg
      4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.left_deadline.avg
      4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.left_vruntime.avg
    583523 ±  4%     +13.7%     663177 ±  4%  sched_debug.cfs_rq:/.load.avg
      0.58 ±  5%     +13.7%       0.66 ±  4%  sched_debug.cfs_rq:/.nr_queued.avg
     14.14 ± 22%     +45.5%      20.57 ± 15%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     67.99 ± 17%     +33.5%      90.77 ± 13%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
     13.66 ± 23%     +43.0%      19.54 ± 14%  sched_debug.cfs_rq:/.removed.util_avg.avg
     66.93 ± 17%     +31.0%      87.71 ± 14%  sched_debug.cfs_rq:/.removed.util_avg.stddev
      4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.right_vruntime.avg
    554.59 ±  2%      +8.7%     602.90 ±  3%  sched_debug.cfs_rq:/.runnable_avg.avg
      1553 ±  7%     +26.1%       1959 ± 17%  sched_debug.cfs_rq:/.runnable_avg.max
    266.15 ±  7%     +25.4%     333.85 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
      0.03 ± 76%    +526.1%       0.19 ± 31%  sched_debug.cfs_rq:/.spread.avg
      2.81 ± 88%    +377.6%      13.40 ± 48%  sched_debug.cfs_rq:/.spread.max
      0.24 ± 77%    +426.2%       1.26 ± 34%  sched_debug.cfs_rq:/.spread.stddev
-6.962e+10          -329.4%  1.597e+11 ± 20%  sched_debug.cfs_rq:/.sum_w_vruntime.avg
 1.654e+12 ±112%    +407.7%  8.398e+12 ± 18%  sched_debug.cfs_rq:/.sum_w_vruntime.max
    106852 ± 31%     +74.1%     185984 ± 11%  sched_debug.cfs_rq:/.sum_weight.avg
     29261           -10.3%      26241        sched_debug.cfs_rq:/.zero_vruntime.avg
    516.36 ±  3%     +85.2%     956.40 ±  3%  sched_debug.cpu.clock_task.stddev
    551602            -7.0%     513143        sched_debug.cpu.curr->pid.max
      0.59 ±  4%     +12.6%       0.67 ±  4%  sched_debug.cpu.nr_running.avg
     74120 ± 14%     -29.8%      52067 ± 25%  sched_debug.cpu.nr_switches.max
      4844 ± 18%     -36.8%       3062 ± 31%  sched_debug.cpu.nr_switches.stddev
      0.08 ± 47%     +83.8%       0.15 ± 20%  sched_debug.cpu.nr_uninterruptible.avg




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-next:master] [vmalloc]  60ced5818f: stress-ng.shm.ops_per_sec 7.2% regression
  2026-06-02 13:18 [linux-next:master] [vmalloc] 60ced5818f: stress-ng.shm.ops_per_sec 7.2% regression kernel test robot
@ 2026-06-09  7:35 ` Yeoreum Yun
  2026-06-15  9:51   ` Yeoreum Yun
  0 siblings, 1 reply; 5+ messages in thread
From: Yeoreum Yun @ 2026-06-09  7:35 UTC (permalink / raw)
  To: kernel test robot
  Cc: Ryan Roberts, oe-lkp, lkp, Andrew Morton, Muhammad Usama Anjum,
	Vlastimil Babka, Zi Yan, David Hildenbrand, Uladzislau Rezki,
	Brendan Jackman, David Sterba, Johannes Weiner, Liam Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Nick Terrell,
	Suren Baghdasaryan, Vishal Moola, linux-mm

> 
> 
> Hello,
> 
> kernel test robot noticed a 7.2% regression of stress-ng.shm.ops_per_sec on:
> 
> 
> commit: 60ced5818f64ac356620d1ad3e0d473c457dbf5b ("vmalloc: optimize vfree with free_pages_bulk()")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> [still regression on linux-next/master 7da7f07112610a520567421dd2ffcb51beaefbcc]
> 
> testcase: stress-ng
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
> parameters:
> 
> 	nr_threads: 100%
> 	testtime: 60s
> 	test: shm
> 	cpufreq_governor: performance
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202606022131.112319f2-lkp@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20260602/202606022131.112319f2-lkp@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/shm/stress-ng/60s
> 
> commit: 
>   4aa4abf1f1 ("mm/page_alloc: optimize free_contig_range()")
>   60ced5818f ("vmalloc: optimize vfree with free_pages_bulk()")
> 
> 4aa4abf1f14bd6d0 60ced5818f64ac356620d1ad3e0 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>    1103082            -7.2%    1024016        stress-ng.shm.ops
>      18394            -7.2%      17072        stress-ng.shm.ops_per_sec
>    2084775           -23.1%    1602693 ±  3%  stress-ng.time.involuntary_context_switches
>  3.076e+08            -7.3%  2.852e+08        stress-ng.time.minor_page_faults
>      14759           -21.1%      11646        stress-ng.time.percent_of_cpu_this_job_got
>       8806           -21.1%       6946        stress-ng.time.system_time
>      86.08            -7.5%      79.66        stress-ng.time.user_time
>    2799689            -1.3%    2762252        stress-ng.time.voluntary_context_switches
>  1.125e+09 ±  6%     +42.3%  1.601e+09 ± 13%  cpuidle..time
>    2089564 ±  3%     +25.6%    2624591 ±  2%  cpuidle..usage
>     360.33           -22.5%     279.35 ± 44%  turbostat.PkgWatt
>      36.35           -22.6%      28.12 ± 44%  turbostat.RAMWatt
>     198.28 ±  2%      +9.2%     216.55        vmstat.procs.r
>     131962            -6.3%     123686        vmstat.system.cs
>   14039730 ±  2%     -18.1%   11503879 ± 18%  numa-meminfo.node0.MemUsed
>     175943 ± 13%    +114.1%     376780 ±  2%  numa-meminfo.node0.PageTables
>     184811 ± 13%    +105.7%     380127 ±  4%  numa-meminfo.node1.PageTables
>       9.32 ±  5%      +3.2       12.57 ± 11%  mpstat.cpu.all.idle%
>       3.69 ±  9%      +5.6        9.29 ±  4%  mpstat.cpu.all.soft%
>      85.37            -8.6       76.73        mpstat.cpu.all.sys%
>       1.38            -0.2        1.19 ±  3%  mpstat.cpu.all.usr%
>  4.555e+08            -7.4%  4.216e+08        numa-numastat.node0.local_node
>  4.557e+08            -7.5%  4.217e+08        numa-numastat.node0.numa_hit
>  4.493e+08            -6.8%  4.187e+08        numa-numastat.node1.local_node
>  4.494e+08            -6.8%  4.189e+08        numa-numastat.node1.numa_hit
>     193547            +7.4%     207908 ±  3%  perf-stat.i.cpu-clock
>     193547            +7.4%     207908 ±  3%  perf-stat.i.task-clock
>     135837            -8.5%     124266 ±  2%  perf-stat.ps.context-switches
>    5141774           -11.4%    4555341 ±  2%  perf-stat.ps.minor-faults
>    5141776           -11.4%    4555343 ±  2%  perf-stat.ps.page-faults
>     194174           +12.8%     219047        meminfo.KReclaimable
>   24232028            -8.6%   22159275        meminfo.Memused
>     354997 ± 14%    +111.6%     751255 ±  4%  meminfo.PageTables
>     194174           +12.8%     219047        meminfo.SReclaimable
>     563220           +16.5%     656030        meminfo.SUnreclaim
>     757395           +15.5%     875078        meminfo.Slab
>     350188           +11.4%     390033        meminfo.VmallocUsed
>   26142507            -9.8%   23588567        meminfo.max_used_kB
>      43483 ± 14%    +119.0%      95246 ±  2%  numa-vmstat.node0.nr_page_table_pages
>      43257 ±  3%     +12.4%      48605        numa-vmstat.node0.nr_vmalloc
>  4.557e+08            -7.5%  4.217e+08        numa-vmstat.node0.numa_hit
>  4.555e+08            -7.4%  4.216e+08        numa-vmstat.node0.numa_local
>      45890 ± 13%    +109.3%      96035 ±  4%  numa-vmstat.node1.nr_page_table_pages
>      44555 ±  4%      +9.3%      48699        numa-vmstat.node1.nr_vmalloc
>  4.494e+08            -6.8%  4.189e+08        numa-vmstat.node1.numa_hit
>  4.493e+08            -6.8%  4.187e+08        numa-vmstat.node1.numa_local
>       0.16 ± 16%    +707.7%       1.30 ± 44%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>     346.29 ± 85%   +1061.2%       4021 ± 38%  perf-sched.sch_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>       0.16 ± 16%    +707.7%       1.30 ± 44%  perf-sched.total_sch_delay.average.ms
>     346.29 ± 85%   +1061.2%       4021 ± 38%  perf-sched.total_sch_delay.max.ms
>       7.34           +58.2%      11.61 ± 13%  perf-sched.total_wait_and_delay.average.ms
>       4478 ±  6%     +49.6%       6697 ± 20%  perf-sched.total_wait_and_delay.max.ms
>       7.18           +43.6%      10.31 ± 10%  perf-sched.total_wait_time.average.ms
>       4477 ±  5%     +29.7%       5809 ± 17%  perf-sched.total_wait_time.max.ms
>       7.34           +58.2%      11.61 ± 13%  perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>       4478 ±  6%     +49.6%       6697 ± 20%  perf-sched.wait_and_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>       7.18           +43.6%      10.31 ± 10%  perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>       4477 ±  5%     +29.7%       5809 ± 17%  perf-sched.wait_time.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>    1975577            -4.0%    1896474        proc-vmstat.nr_active_anon
>     973349            -2.6%     947803        proc-vmstat.nr_anon_pages
>      46138            +3.1%      47546        proc-vmstat.nr_kernel_stack
>      90193 ± 14%    +110.0%     189399 ±  4%  proc-vmstat.nr_page_table_pages
>      48563           +12.8%      54769        proc-vmstat.nr_slab_reclaimable
>     140817           +16.5%     164023        proc-vmstat.nr_slab_unreclaimable
>      87646           +11.2%      97438        proc-vmstat.nr_vmalloc
>    1975576            -4.0%    1896478        proc-vmstat.nr_zone_active_anon
>  9.051e+08            -7.1%  8.406e+08        proc-vmstat.numa_hit
>  9.048e+08            -7.1%  8.403e+08        proc-vmstat.numa_local
>  9.069e+08            -7.1%  8.421e+08        proc-vmstat.pgalloc_normal
>  3.538e+08            -7.5%  3.273e+08        proc-vmstat.pgfault
>  9.061e+08            -7.1%  8.414e+08        proc-vmstat.pgfree
>      29261           -10.3%      26241        sched_debug.cfs_rq:/.avg_vruntime.avg
>       0.58 ±  5%     +13.8%       0.66 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.avg
>       0.58 ±  5%     +13.4%       0.66 ±  4%  sched_debug.cfs_rq:/.h_nr_runnable.avg
>       4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.left_deadline.avg
>       4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.left_vruntime.avg
>     583523 ±  4%     +13.7%     663177 ±  4%  sched_debug.cfs_rq:/.load.avg
>       0.58 ±  5%     +13.7%       0.66 ±  4%  sched_debug.cfs_rq:/.nr_queued.avg
>      14.14 ± 22%     +45.5%      20.57 ± 15%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
>      67.99 ± 17%     +33.5%      90.77 ± 13%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>      13.66 ± 23%     +43.0%      19.54 ± 14%  sched_debug.cfs_rq:/.removed.util_avg.avg
>      66.93 ± 17%     +31.0%      87.71 ± 14%  sched_debug.cfs_rq:/.removed.util_avg.stddev
>       4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.right_vruntime.avg
>     554.59 ±  2%      +8.7%     602.90 ±  3%  sched_debug.cfs_rq:/.runnable_avg.avg
>       1553 ±  7%     +26.1%       1959 ± 17%  sched_debug.cfs_rq:/.runnable_avg.max
>     266.15 ±  7%     +25.4%     333.85 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
>       0.03 ± 76%    +526.1%       0.19 ± 31%  sched_debug.cfs_rq:/.spread.avg
>       2.81 ± 88%    +377.6%      13.40 ± 48%  sched_debug.cfs_rq:/.spread.max
>       0.24 ± 77%    +426.2%       1.26 ± 34%  sched_debug.cfs_rq:/.spread.stddev
> -6.962e+10          -329.4%  1.597e+11 ± 20%  sched_debug.cfs_rq:/.sum_w_vruntime.avg
>  1.654e+12 ±112%    +407.7%  8.398e+12 ± 18%  sched_debug.cfs_rq:/.sum_w_vruntime.max
>     106852 ± 31%     +74.1%     185984 ± 11%  sched_debug.cfs_rq:/.sum_weight.avg
>      29261           -10.3%      26241        sched_debug.cfs_rq:/.zero_vruntime.avg
>     516.36 ±  3%     +85.2%     956.40 ±  3%  sched_debug.cpu.clock_task.stddev
>     551602            -7.0%     513143        sched_debug.cpu.curr->pid.max
>       0.59 ±  4%     +12.6%       0.67 ±  4%  sched_debug.cpu.nr_running.avg
>      74120 ± 14%     -29.8%      52067 ± 25%  sched_debug.cpu.nr_switches.max
>       4844 ± 18%     -36.8%       3062 ± 31%  sched_debug.cpu.nr_switches.stddev
>       0.08 ± 47%     +83.8%       0.15 ± 20%  sched_debug.cpu.nr_uninterruptible.avg
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

Thanks. I'll check it.

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-next:master] [vmalloc]  60ced5818f: stress-ng.shm.ops_per_sec 7.2% regression
  2026-06-09  7:35 ` Yeoreum Yun
@ 2026-06-15  9:51   ` Yeoreum Yun
  2026-06-15 15:32     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 5+ messages in thread
From: Yeoreum Yun @ 2026-06-15  9:51 UTC (permalink / raw)
  To: kernel test robot
  Cc: Ryan Roberts, oe-lkp, lkp, Andrew Morton, Muhammad Usama Anjum,
	Vlastimil Babka, Zi Yan, David Hildenbrand, Uladzislau Rezki,
	Brendan Jackman, David Sterba, Johannes Weiner, Liam Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Nick Terrell,
	Suren Baghdasaryan, Vishal Moola, linux-mm

> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed a 7.2% regression of stress-ng.shm.ops_per_sec on:
> > 
> > 
> > commit: 60ced5818f64ac356620d1ad3e0d473c457dbf5b ("vmalloc: optimize vfree with free_pages_bulk()")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > 
> > [still regression on linux-next/master 7da7f07112610a520567421dd2ffcb51beaefbcc]
> > 
> > testcase: stress-ng
> > config: x86_64-rhel-9.4
> > compiler: gcc-14
> > test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
> > parameters:
> > 
> > 	nr_threads: 100%
> > 	testtime: 60s
> > 	test: shm
> > 	cpufreq_governor: performance
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202606022131.112319f2-lkp@intel.com
> > 
> > 
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20260602/202606022131.112319f2-lkp@intel.com
> > 
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> >   gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/shm/stress-ng/60s
> > 
> > commit: 
> >   4aa4abf1f1 ("mm/page_alloc: optimize free_contig_range()")
> >   60ced5818f ("vmalloc: optimize vfree with free_pages_bulk()")
> > 
> > 4aa4abf1f14bd6d0 60ced5818f64ac356620d1ad3e0 
> > ---------------- --------------------------- 
> >          %stddev     %change         %stddev
> >              \          |                \  
> >    1103082            -7.2%    1024016        stress-ng.shm.ops
> >      18394            -7.2%      17072        stress-ng.shm.ops_per_sec
> >    2084775           -23.1%    1602693 ±  3%  stress-ng.time.involuntary_context_switches
> >  3.076e+08            -7.3%  2.852e+08        stress-ng.time.minor_page_faults
> >      14759           -21.1%      11646        stress-ng.time.percent_of_cpu_this_job_got
> >       8806           -21.1%       6946        stress-ng.time.system_time
> >      86.08            -7.5%      79.66        stress-ng.time.user_time
> >    2799689            -1.3%    2762252        stress-ng.time.voluntary_context_switches
> >  1.125e+09 ±  6%     +42.3%  1.601e+09 ± 13%  cpuidle..time
> >    2089564 ±  3%     +25.6%    2624591 ±  2%  cpuidle..usage
> >     360.33           -22.5%     279.35 ± 44%  turbostat.PkgWatt
> >      36.35           -22.6%      28.12 ± 44%  turbostat.RAMWatt
> >     198.28 ±  2%      +9.2%     216.55        vmstat.procs.r
> >     131962            -6.3%     123686        vmstat.system.cs
> >   14039730 ±  2%     -18.1%   11503879 ± 18%  numa-meminfo.node0.MemUsed
> >     175943 ± 13%    +114.1%     376780 ±  2%  numa-meminfo.node0.PageTables
> >     184811 ± 13%    +105.7%     380127 ±  4%  numa-meminfo.node1.PageTables
> >       9.32 ±  5%      +3.2       12.57 ± 11%  mpstat.cpu.all.idle%
> >       3.69 ±  9%      +5.6        9.29 ±  4%  mpstat.cpu.all.soft%
> >      85.37            -8.6       76.73        mpstat.cpu.all.sys%
> >       1.38            -0.2        1.19 ±  3%  mpstat.cpu.all.usr%
> >  4.555e+08            -7.4%  4.216e+08        numa-numastat.node0.local_node
> >  4.557e+08            -7.5%  4.217e+08        numa-numastat.node0.numa_hit
> >  4.493e+08            -6.8%  4.187e+08        numa-numastat.node1.local_node
> >  4.494e+08            -6.8%  4.189e+08        numa-numastat.node1.numa_hit
> >     193547            +7.4%     207908 ±  3%  perf-stat.i.cpu-clock
> >     193547            +7.4%     207908 ±  3%  perf-stat.i.task-clock
> >     135837            -8.5%     124266 ±  2%  perf-stat.ps.context-switches
> >    5141774           -11.4%    4555341 ±  2%  perf-stat.ps.minor-faults
> >    5141776           -11.4%    4555343 ±  2%  perf-stat.ps.page-faults
> >     194174           +12.8%     219047        meminfo.KReclaimable
> >   24232028            -8.6%   22159275        meminfo.Memused
> >     354997 ± 14%    +111.6%     751255 ±  4%  meminfo.PageTables
> >     194174           +12.8%     219047        meminfo.SReclaimable
> >     563220           +16.5%     656030        meminfo.SUnreclaim
> >     757395           +15.5%     875078        meminfo.Slab
> >     350188           +11.4%     390033        meminfo.VmallocUsed
> >   26142507            -9.8%   23588567        meminfo.max_used_kB
> >      43483 ± 14%    +119.0%      95246 ±  2%  numa-vmstat.node0.nr_page_table_pages
> >      43257 ±  3%     +12.4%      48605        numa-vmstat.node0.nr_vmalloc
> >  4.557e+08            -7.5%  4.217e+08        numa-vmstat.node0.numa_hit
> >  4.555e+08            -7.4%  4.216e+08        numa-vmstat.node0.numa_local
> >      45890 ± 13%    +109.3%      96035 ±  4%  numa-vmstat.node1.nr_page_table_pages
> >      44555 ±  4%      +9.3%      48699        numa-vmstat.node1.nr_vmalloc
> >  4.494e+08            -6.8%  4.189e+08        numa-vmstat.node1.numa_hit
> >  4.493e+08            -6.8%  4.187e+08        numa-vmstat.node1.numa_local
> >       0.16 ± 16%    +707.7%       1.30 ± 44%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> >     346.29 ± 85%   +1061.2%       4021 ± 38%  perf-sched.sch_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> >       0.16 ± 16%    +707.7%       1.30 ± 44%  perf-sched.total_sch_delay.average.ms
> >     346.29 ± 85%   +1061.2%       4021 ± 38%  perf-sched.total_sch_delay.max.ms
> >       7.34           +58.2%      11.61 ± 13%  perf-sched.total_wait_and_delay.average.ms
> >       4478 ±  6%     +49.6%       6697 ± 20%  perf-sched.total_wait_and_delay.max.ms
> >       7.18           +43.6%      10.31 ± 10%  perf-sched.total_wait_time.average.ms
> >       4477 ±  5%     +29.7%       5809 ± 17%  perf-sched.total_wait_time.max.ms
> >       7.34           +58.2%      11.61 ± 13%  perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> >       4478 ±  6%     +49.6%       6697 ± 20%  perf-sched.wait_and_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> >       7.18           +43.6%      10.31 ± 10%  perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> >       4477 ±  5%     +29.7%       5809 ± 17%  perf-sched.wait_time.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> >    1975577            -4.0%    1896474        proc-vmstat.nr_active_anon
> >     973349            -2.6%     947803        proc-vmstat.nr_anon_pages
> >      46138            +3.1%      47546        proc-vmstat.nr_kernel_stack
> >      90193 ± 14%    +110.0%     189399 ±  4%  proc-vmstat.nr_page_table_pages
> >      48563           +12.8%      54769        proc-vmstat.nr_slab_reclaimable
> >     140817           +16.5%     164023        proc-vmstat.nr_slab_unreclaimable
> >      87646           +11.2%      97438        proc-vmstat.nr_vmalloc
> >    1975576            -4.0%    1896478        proc-vmstat.nr_zone_active_anon
> >  9.051e+08            -7.1%  8.406e+08        proc-vmstat.numa_hit
> >  9.048e+08            -7.1%  8.403e+08        proc-vmstat.numa_local
> >  9.069e+08            -7.1%  8.421e+08        proc-vmstat.pgalloc_normal
> >  3.538e+08            -7.5%  3.273e+08        proc-vmstat.pgfault
> >  9.061e+08            -7.1%  8.414e+08        proc-vmstat.pgfree
> >      29261           -10.3%      26241        sched_debug.cfs_rq:/.avg_vruntime.avg
> >       0.58 ±  5%     +13.8%       0.66 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.avg
> >       0.58 ±  5%     +13.4%       0.66 ±  4%  sched_debug.cfs_rq:/.h_nr_runnable.avg
> >       4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.left_deadline.avg
> >       4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.left_vruntime.avg
> >     583523 ±  4%     +13.7%     663177 ±  4%  sched_debug.cfs_rq:/.load.avg
> >       0.58 ±  5%     +13.7%       0.66 ±  4%  sched_debug.cfs_rq:/.nr_queued.avg
> >      14.14 ± 22%     +45.5%      20.57 ± 15%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
> >      67.99 ± 17%     +33.5%      90.77 ± 13%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> >      13.66 ± 23%     +43.0%      19.54 ± 14%  sched_debug.cfs_rq:/.removed.util_avg.avg
> >      66.93 ± 17%     +31.0%      87.71 ± 14%  sched_debug.cfs_rq:/.removed.util_avg.stddev
> >       4034 ± 33%     +53.7%       6200 ± 13%  sched_debug.cfs_rq:/.right_vruntime.avg
> >     554.59 ±  2%      +8.7%     602.90 ±  3%  sched_debug.cfs_rq:/.runnable_avg.avg
> >       1553 ±  7%     +26.1%       1959 ± 17%  sched_debug.cfs_rq:/.runnable_avg.max
> >     266.15 ±  7%     +25.4%     333.85 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
> >       0.03 ± 76%    +526.1%       0.19 ± 31%  sched_debug.cfs_rq:/.spread.avg
> >       2.81 ± 88%    +377.6%      13.40 ± 48%  sched_debug.cfs_rq:/.spread.max
> >       0.24 ± 77%    +426.2%       1.26 ± 34%  sched_debug.cfs_rq:/.spread.stddev
> > -6.962e+10          -329.4%  1.597e+11 ± 20%  sched_debug.cfs_rq:/.sum_w_vruntime.avg
> >  1.654e+12 ±112%    +407.7%  8.398e+12 ± 18%  sched_debug.cfs_rq:/.sum_w_vruntime.max
> >     106852 ± 31%     +74.1%     185984 ± 11%  sched_debug.cfs_rq:/.sum_weight.avg
> >      29261           -10.3%      26241        sched_debug.cfs_rq:/.zero_vruntime.avg
> >     516.36 ±  3%     +85.2%     956.40 ±  3%  sched_debug.cpu.clock_task.stddev
> >     551602            -7.0%     513143        sched_debug.cpu.curr->pid.max
> >       0.59 ±  4%     +12.6%       0.67 ±  4%  sched_debug.cpu.nr_running.avg
> >      74120 ± 14%     -29.8%      52067 ± 25%  sched_debug.cpu.nr_switches.max
> >       4844 ± 18%     -36.8%       3062 ± 31%  sched_debug.cpu.nr_switches.stddev
> >       0.08 ± 47%     +83.8%       0.15 ± 20%  sched_debug.cpu.nr_uninterruptible.avg
> > 
> > 
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> > 
> > -- 
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> 
> Thanks. I'll check it.

Looking at the patch [1], the regression reported by lkp [2] appears to
be caused by a lack of order-0 pages in the PCP lists,
which increases zone-locked allocations (mm_page_alloc_zone_locked).

On my reproduced setup (an x86_64 workstation), most pages freed through
free_pages_bulk() were order-2 pages (nr_pages == 4), including
vmalloc-backed stacks. With patch [1], these order-2 pages are returned to
the order-2 PCP lists, unlike commit 4aa4abf1f1 (“mm/page_alloc: optimize free_contig_range()”),
which effectively populated the order-0 PCP lists.

Since the shmem workload appears to fault memory at PAGE_SIZE granularity,
the reduced availability of order-0 pages in PCP lists seems to increase
zone-locked order-0 allocations, which may explain the regression observed by lkp.

Performance comparison (5 runs):
  - perf stat -e 'kmem:mm_page_alloc_extfrag' --filter 'alloc_order == 0' \
              -e 'kmem:mm_page_alloc_zone_locked' --filter 'order == 0' \
              -e 'kmem:mm_page_alloc' --filter 'order == 0' -- ./repro-script

Metric                               4aa4abf1f1           60ced5818f        Difference

------------------------------------------------------------------------------------------

Zone-locked Allocation Ratio (%)     46.39 ± 0.17%       48.31 ± 0.37%       +1.92 pp

  - Ratio = (mm_page_alloc_extfrag + mm_page_alloc_zone_locked) / mm_page_alloc × 100
  - Values are reported as median ± relative half-range across three runs.

The shmem test appears to handle page faults at PAGE_SIZE granularity, which seems to amplify the impact of the reduced availability of order-0 pages.


As an experiment, I modified free_pages_bulk() to free pages individually
as order-0 pages when nr_contig <= (1 << PAGE_ALLOC_COSTLY_ORDER).
This restores behavior closer to 4aa4abf1f1 and results in almost
no difference compared to that commit.

Performance comparison (5 runs)
  - ./repro-script

Metric                     4aa4abf1f1            change             Difference
----------------------------------------------------------------------------------
bogo_ops                   656,513 ± 0.34%      656,488 ± 0.23%          -25

bogo_ops/s (realtime)      10,935.95 ± 0.34%    10,935.86 ± 0.23%      -0.09

bogo_ops/s (usr+sys time)     220.31 ± 0.40%       219.30 ± 0.21%      -1.01

The differences are negligible and essentially restore the performance
observed with 4aa4abf1f1.

Given that the regression appears to be driven by a synthetic workload that
combines frequent shmem page faults with repeated stack allocation/free operations,
I do not think this is a significant concern for typical real-world workloads.

Thanks!

[1] https://lore.kernel.org/all/20260401101634.2868165-2-usama.anjum@arm.com/
[2] https://lore.kernel.org/r/202606022131.112319f2-lkp@intel.com
[3] https://download.01.org/0day-ci/archive/20260602/202606022131.112319f2-lkp@intel.com/repro-script

-----------------&<------------------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 91bef811a771..48d9eaa1a2f3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5206,13 +5206,24 @@ EXPORT_SYMBOL_GPL(alloc_pages_bulk_noprof);
  */
 void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
 {
+       const unsigned long nr_costly = 1UL << PAGE_ALLOC_COSTLY_ORDER;
+
        while (nr_pages) {
                unsigned long nr_contig = num_pages_contiguous(page_array, nr_pages);

-               __free_contig_range(page_to_pfn(*page_array), nr_contig);
+               if (nr_contig <= nr_costly) {
+                       while (nr_contig--) {
+                               __free_page(*page_array);
+                               nr_pages--;
+                               page_array++;
+                       }
+               } else {
+                       __free_contig_range(page_to_pfn(*page_array), nr_contig);
+
+                       nr_pages -= nr_contig;
+                       page_array += nr_contig;
+               }

-               nr_pages -= nr_contig;
-               page_array += nr_contig;
                cond_resched();
        }
 }

> -- 
> Sincerely,
> Yeoreum Yun

-- 
Sincerely,
Yeoreum Yun


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [linux-next:master] [vmalloc] 60ced5818f: stress-ng.shm.ops_per_sec 7.2% regression
  2026-06-15  9:51   ` Yeoreum Yun
@ 2026-06-15 15:32     ` David Hildenbrand (Arm)
  2026-06-15 16:45       ` Yeoreum Yun
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-15 15:32 UTC (permalink / raw)
  To: Yeoreum Yun, kernel test robot
  Cc: Ryan Roberts, oe-lkp, lkp, Andrew Morton, Muhammad Usama Anjum,
	Vlastimil Babka, Zi Yan, Uladzislau Rezki, Brendan Jackman,
	David Sterba, Johannes Weiner, Liam Howlett, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Nick Terrell, Suren Baghdasaryan,
	Vishal Moola, linux-mm


>>> -- 
>>> 0-DAY CI Kernel Test Service
>>> https://github.com/intel/lkp-tests/wiki
>>
>> Thanks. I'll check it.

Thanks for digging into the details!

> 
> Looking at the patch [1], the regression reported by lkp [2] appears to
> be caused by a lack of order-0 pages in the PCP lists,
> which increases zone-locked allocations (mm_page_alloc_zone_locked).
> 
> On my reproduced setup (an x86_64 workstation), most pages freed through
> free_pages_bulk() were order-2 pages (nr_pages == 4), including
> vmalloc-backed stacks. With patch [1], these order-2 pages are returned to
> the order-2 PCP lists, unlike commit 4aa4abf1f1 (“mm/page_alloc: optimize free_contig_range()”),
> which effectively populated the order-0 PCP lists.
> 
> Since the shmem workload appears to fault memory at PAGE_SIZE granularity,
> the reduced availability of order-0 pages in PCP lists seems to increase
> zone-locked order-0 allocations, which may explain the regression observed by lkp.
> 
> Performance comparison (5 runs):
>   - perf stat -e 'kmem:mm_page_alloc_extfrag' --filter 'alloc_order == 0' \
>               -e 'kmem:mm_page_alloc_zone_locked' --filter 'order == 0' \
>               -e 'kmem:mm_page_alloc' --filter 'order == 0' -- ./repro-script
> 
> Metric                               4aa4abf1f1           60ced5818f        Difference
> 
> ------------------------------------------------------------------------------------------
> 
> Zone-locked Allocation Ratio (%)     46.39 ± 0.17%       48.31 ± 0.37%       +1.92 pp
> 
>   - Ratio = (mm_page_alloc_extfrag + mm_page_alloc_zone_locked) / mm_page_alloc × 100
>   - Values are reported as median ± relative half-range across three runs.
> 
> The shmem test appears to handle page faults at PAGE_SIZE granularity, which seems to amplify the impact of the reduced availability of order-0 pages.

Okay, so less fragmentation in the PCP results in fallback to the buddy for
order-0. Given that we don't split in the PCP and fallback to the buddy, that
makes sense.

> 
> 
> As an experiment, I modified free_pages_bulk() to free pages individually
> as order-0 pages when nr_contig <= (1 << PAGE_ALLOC_COSTLY_ORDER).
> This restores behavior closer to 4aa4abf1f1 and results in almost
> no difference compared to that commit.
> 
> Performance comparison (5 runs)
>   - ./repro-script
> 
> Metric                     4aa4abf1f1            change             Difference
> ----------------------------------------------------------------------------------
> bogo_ops                   656,513 ± 0.34%      656,488 ± 0.23%          -25
> 
> bogo_ops/s (realtime)      10,935.95 ± 0.34%    10,935.86 ± 0.23%      -0.09
> 
> bogo_ops/s (usr+sys time)     220.31 ± 0.40%       219.30 ± 0.21%      -1.01
> 
> The differences are negligible and essentially restore the performance
> observed with 4aa4abf1f1.

Makes sense.

Looking at the original results, I spotted

    354997 ± 14%    +111.6%     751255 ±  4%  meminfo.PageTables

So we consumed twice the (process) page tables. Did you also manage to reproduce
that or do you have an explanation for that?

> 
> Given that the regression appears to be driven by a synthetic workload that
> combines frequent shmem page faults with repeated stack allocation/free operations,
> I do not think this is a significant concern for typical real-world workloads.

Yeah, if it's "we have less fragmentation", I agree. Using order-2 folios for
shmem would likely similarly mitigate the problem I assume.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-next:master] [vmalloc] 60ced5818f: stress-ng.shm.ops_per_sec 7.2% regression
  2026-06-15 15:32     ` David Hildenbrand (Arm)
@ 2026-06-15 16:45       ` Yeoreum Yun
  0 siblings, 0 replies; 5+ messages in thread
From: Yeoreum Yun @ 2026-06-15 16:45 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: kernel test robot, Ryan Roberts, oe-lkp, lkp, Andrew Morton,
	Muhammad Usama Anjum, Vlastimil Babka, Zi Yan, Uladzislau Rezki,
	Brendan Jackman, David Sterba, Johannes Weiner, Liam Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Nick Terrell,
	Suren Baghdasaryan, Vishal Moola, linux-mm

Hi David,

[...]
 
> Looking at the original results, I spotted
> 
>     354997 ± 14%    +111.6%     751255 ±  4%  meminfo.PageTables
> 
> So we consumed twice the (process) page tables. Did you also manage to reproduce
> that or do you have an explanation for that?

Unfortunately not, When I tried with the reproduce scripts and see the
peek usage of meminfo for Pagetable, it's almost the same (from 90000KB ~
100,000KB), I couldn't observe the this drasmatic usage increase via
this testcase. So I think this is not for this patch.

> 
> > 
> > Given that the regression appears to be driven by a synthetic workload that
> > combines frequent shmem page faults with repeated stack allocation/free operations,
> > I do not think this is a significant concern for typical real-world workloads.
> 
> Yeah, if it's "we have less fragmentation", I agree. Using order-2 folios for
> shmem would likely similarly mitigate the problem I assume.
> 

Agree. Thanks!

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-15 16:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 13:18 [linux-next:master] [vmalloc] 60ced5818f: stress-ng.shm.ops_per_sec 7.2% regression kernel test robot
2026-06-09  7:35 ` Yeoreum Yun
2026-06-15  9:51   ` Yeoreum Yun
2026-06-15 15:32     ` David Hildenbrand (Arm)
2026-06-15 16:45       ` Yeoreum Yun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.