[linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [linus:master] [mm]  9890ecab6a:  vm-scalability.throughput 3.8% regression
@ 2026-03-10  6:39 kernel test robot
  2026-03-11 19:04 ` Ankur Arora
  0 siblings, 1 reply; 2+ messages in thread
From: kernel test robot @ 2026-03-10  6:39 UTC (permalink / raw)
  To: Ankur Arora
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Raghavendra K T,
	David Hildenbrand, Andy Lutomirski, Borislav Petkov (AMD),
	Boris Ostrovsky, H. Peter Anvin, Ingo Molnar,
	Konrad Rzessutek Wilk, Lance Yang, Liam R. Howlett, Li Zhe,
	Lorenzo Stoakes, Mateusz Guzik, Matthew Wilcox, Michal Hocko,
	Mike Rapoport, Peter Zijlstra, Suren Baghdasaryan,
	Thomas Gleixner, Vlastimil Babka, linux-mm, oliver.sang



Hello,

kernel test robot noticed a 3.8% regression of vm-scalability.throughput on:


commit: 9890ecab6ad9c0d3d342469f3b619fd704b5c59a ("mm: folio_zero_user: clear pages sequentially")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[still regression on linus/master      0031c06807cfa8aa51a759ff8aa09e1aa48149af]
[still regression on linux-next/master c025f6cf4209e1542ec2afebe49f42bbaf1a5c7b]

testcase: vm-scalability
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

	runtime: 300s
	size: 8T
	test: anon-w-seq-mt
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202603101342.297fb270-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260310/202603101342.297fb270-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/debian-13-x86_64-20250902.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability

commit: 
  cb431accb3 ("x86/clear_page: introduce clear_pages()")
  9890ecab6a ("mm: folio_zero_user: clear pages sequentially")

cb431accb36e51b6 9890ecab6ad9c0d3d342469f3b6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.08 ±  3%      +8.7%       0.09 ±  3%  vm-scalability.free_time
    357969            -6.6%     334511        vm-scalability.median
 1.034e+08            -3.8%   99382138        vm-scalability.throughput
    634243           -13.6%     548120 ±  6%  vm-scalability.time.involuntary_context_switches
  12706518            -6.6%   11872543        vm-scalability.time.minor_page_faults
     15142            -4.2%      14512        vm-scalability.time.system_time
     16939            +6.1%      17975        vm-scalability.time.user_time
    251227            -6.8%     234071        vm-scalability.time.voluntary_context_switches
 1.791e+10            -6.6%  1.674e+10        vm-scalability.workload
      0.30            -7.5%       0.28        turbostat.IPC
      9203            -5.5%       8693        vmstat.system.cs
      0.08            +0.0        0.08        mpstat.cpu.all.soft%
     25.14            +1.5       26.62        mpstat.cpu.all.usr%
      3.13           +18.3%       3.71        perf-stat.i.MPKI
  6.22e+10            -6.6%   5.81e+10        perf-stat.i.branch-instructions
     61.69            +9.8       71.51        perf-stat.i.cache-miss-rate%
 6.147e+08           +10.7%  6.805e+08        perf-stat.i.cache-misses
 9.904e+08            -4.7%  9.436e+08        perf-stat.i.cache-references
      9303            -5.2%       8823        perf-stat.i.context-switches
      2.17            +8.2%       2.35        perf-stat.i.cpi
    598.97            -4.6%     571.28        perf-stat.i.cpu-migrations
  1.95e+11            -6.6%  1.822e+11        perf-stat.i.instructions
      0.47            -7.2%       0.43        perf-stat.i.ipc
     43153            -6.5%      40334        perf-stat.i.minor-faults
     43153            -6.5%      40335        perf-stat.i.page-faults
      3.16           +18.5%       3.74        perf-stat.overall.MPKI
      0.02            +0.0        0.03        perf-stat.overall.branch-miss-rate%
     62.11           +10.1       72.19        perf-stat.overall.cache-miss-rate%
      2.19            +8.3%       2.37        perf-stat.overall.cpi
    692.89            -8.6%     633.07        perf-stat.overall.cycles-between-cache-misses
      0.46            -7.6%       0.42        perf-stat.overall.ipc
 6.121e+10            -6.8%  5.705e+10        perf-stat.ps.branch-instructions
 6.054e+08           +10.5%  6.689e+08        perf-stat.ps.cache-misses
 9.747e+08            -4.9%  9.266e+08        perf-stat.ps.cache-references
      9124            -5.6%       8613        perf-stat.ps.context-switches
    583.66            -4.9%     555.21        perf-stat.ps.cpu-migrations
 1.919e+11            -6.8%  1.789e+11        perf-stat.ps.instructions
     42389            -6.7%      39549        perf-stat.ps.minor-faults
     42389            -6.7%      39549        perf-stat.ps.page-faults
 5.812e+13            -6.5%  5.434e+13        perf-stat.total.instructions
     40.26           -40.3        0.00        perf-profile.calltrace.cycles-pp.clear_subpage.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
     40.76            -2.1       38.66        perf-profile.calltrace.cycles-pp.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
     42.59            -2.0       40.61        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
     42.54            -2.0       40.57        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
     42.54            -2.0       40.57        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
     42.40            -2.0       40.43        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
     42.32            -2.0       40.36        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     42.23            -2.0       40.27        perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     41.70            -2.0       39.74        perf-profile.calltrace.cycles-pp.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.76            -0.0        0.72        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.72            -0.0        0.68        perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page
      0.67            -0.0        0.64        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd
      0.72            -0.0        0.69        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
      0.56            -0.0        0.54        perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
      0.00            +0.8        0.76 ±  2%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
     30.25            +1.2       31.46        perf-profile.calltrace.cycles-pp.do_rw_once
     40.49           -40.5        0.00        perf-profile.children.cycles-pp.clear_subpage
     42.61            -2.0       40.63        perf-profile.children.cycles-pp.asm_exc_page_fault
     42.55            -2.0       40.58        perf-profile.children.cycles-pp.exc_page_fault
     42.54            -2.0       40.57        perf-profile.children.cycles-pp.do_user_addr_fault
     42.40            -2.0       40.43        perf-profile.children.cycles-pp.handle_mm_fault
     42.33            -2.0       40.36        perf-profile.children.cycles-pp.__handle_mm_fault
     42.23            -2.0       40.27        perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
     41.70            -2.0       39.74        perf-profile.children.cycles-pp.vma_alloc_anon_folio_pmd
     40.83            -1.9       38.92        perf-profile.children.cycles-pp.folio_zero_user
     63.93            -1.2       62.77        perf-profile.children.cycles-pp.do_access
      0.95            -0.0        0.91        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      0.78            -0.0        0.74        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      0.95            -0.0        0.92        perf-profile.children.cycles-pp.alloc_pages_mpol
      0.79            -0.0        0.76        perf-profile.children.cycles-pp.get_page_from_freelist
      0.63            -0.0        0.60        perf-profile.children.cycles-pp.prep_new_page
     40.31            +2.5       42.80        perf-profile.children.cycles-pp.do_rw_once
     39.77           -39.8        0.00        perf-profile.self.cycles-pp.clear_subpage
      9.54            -0.3        9.23        perf-profile.self.cycles-pp.do_access
      0.55            -0.0        0.53        perf-profile.self.cycles-pp.prep_new_page
     38.35            +2.6       40.96        perf-profile.self.cycles-pp.do_rw_once
      0.36 ±  2%     +38.0       38.32        perf-profile.self.cycles-pp.folio_zero_user




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [linus:master] [mm]  9890ecab6a:  vm-scalability.throughput 3.8% regression
  2026-03-10  6:39 [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression kernel test robot
@ 2026-03-11 19:04 ` Ankur Arora
  0 siblings, 0 replies; 2+ messages in thread
From: Ankur Arora @ 2026-03-11 19:04 UTC (permalink / raw)
  To: kernel test robot
  Cc: Ankur Arora, oe-lkp, lkp, linux-kernel, Andrew Morton,
	Raghavendra K T, David Hildenbrand, Andy Lutomirski,
	Borislav Petkov (AMD), Boris Ostrovsky, H. Peter Anvin,
	Ingo Molnar, Konrad Rzessutek Wilk, Lance Yang, Liam R. Howlett,
	Li Zhe, Lorenzo Stoakes, Mateusz Guzik, Matthew Wilcox,
	Michal Hocko, Mike Rapoport, Peter Zijlstra, Suren Baghdasaryan,
	Thomas Gleixner, Vlastimil Babka, linux-mm


kernel test robot <oliver.sang@intel.com> writes:

> Hello,
>
> kernel test robot noticed a 3.8% regression of vm-scalability.throughput on:
>

[ ... ]

> testcase: vm-scalability
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
>
> 	runtime: 300s
> 	size: 8T
> 	test: anon-w-seq-mt
> 	cpufreq_governor: performance

This test exercises the THP sequential zeroing path.

>      15142            -4.2%      14512        vm-scalability.time.system_time
>      16939            +6.1%      17975        vm-scalability.time.user_time

stime drops because folio_zero_user() is more efficient. But utime goes
up because of higher user miss rate since folio_zero_user() is now clearing
sequentially instead of the earlier left right fashion:

>      61.69            +9.8       71.51        perf-stat.i.cache-miss-rate%
>  6.147e+08           +10.7%  6.805e+08        perf-stat.i.cache-misses
>  9.904e+08            -4.7%  9.436e+08        perf-stat.i.cache-references
>       2.17            +8.2%       2.35        perf-stat.i.cpi


I had noted similar behaviour with anon-w-seq-hugetlb in 93552c9a3350:

   vm-scalability/anon-w-seq-hugetlb: this workload runs with 384 processes
   (one for each CPU) each zeroing anonymously mapped hugetlb memory which
   is then accessed sequentially.

                            stime                utime

   discontiguous-page      1739.93 ( +- 6.15% )  1016.61 ( +- 4.75% )
   contiguous-page         1853.70 ( +- 2.51% )  1187.13 ( +- 3.50% )
   batched-pages           1756.75 ( +- 2.98% )  1133.32 ( +- 4.89% )
   neighbourhood-last      1725.18 ( +- 4.59% )  1123.78 ( +- 7.38% )

  Both stime and utime largely respond somewhat expectedly. There is a
  fair amount of run to run variation but the general trend is that the
  stime drops and utime increases. There are a few oddities, like
  contiguous-page performing very differently from batched-pages.

  As such this is likely an uncommon pattern where we saturate the memory
  bandwidth (since all CPUs are running the test) and at the same time
  are cache constrained because we access the entire region.


Ankur

> cb431accb36e51b6 9890ecab6ad9c0d3d342469f3b6
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>       0.08 ±  3%      +8.7%       0.09 ±  3%  vm-scalability.free_time
>     357969            -6.6%     334511        vm-scalability.median
>  1.034e+08            -3.8%   99382138        vm-scalability.throughput
>     634243           -13.6%     548120 ±  6%  vm-scalability.time.involuntary_context_switches
>   12706518            -6.6%   11872543        vm-scalability.time.minor_page_faults
>      15142            -4.2%      14512        vm-scalability.time.system_time
>      16939            +6.1%      17975        vm-scalability.time.user_time
>     251227            -6.8%     234071        vm-scalability.time.voluntary_context_switches
>  1.791e+10            -6.6%  1.674e+10        vm-scalability.workload
>       0.30            -7.5%       0.28        turbostat.IPC
>       9203            -5.5%       8693        vmstat.system.cs
>       0.08            +0.0        0.08        mpstat.cpu.all.soft%
>      25.14            +1.5       26.62        mpstat.cpu.all.usr%
>       3.13           +18.3%       3.71        perf-stat.i.MPKI
>   6.22e+10            -6.6%   5.81e+10        perf-stat.i.branch-instructions
>      61.69            +9.8       71.51        perf-stat.i.cache-miss-rate%
>  6.147e+08           +10.7%  6.805e+08        perf-stat.i.cache-misses
>  9.904e+08            -4.7%  9.436e+08        perf-stat.i.cache-references
>       9303            -5.2%       8823        perf-stat.i.context-switches
>       2.17            +8.2%       2.35        perf-stat.i.cpi
>     598.97            -4.6%     571.28        perf-stat.i.cpu-migrations
>   1.95e+11            -6.6%  1.822e+11        perf-stat.i.instructions
>       0.47            -7.2%       0.43        perf-stat.i.ipc
>      43153            -6.5%      40334        perf-stat.i.minor-faults
>      43153            -6.5%      40335        perf-stat.i.page-faults
>       3.16           +18.5%       3.74        perf-stat.overall.MPKI
>       0.02            +0.0        0.03        perf-stat.overall.branch-miss-rate%
>      62.11           +10.1       72.19        perf-stat.overall.cache-miss-rate%
>       2.19            +8.3%       2.37        perf-stat.overall.cpi
>     692.89            -8.6%     633.07        perf-stat.overall.cycles-between-cache-misses
>       0.46            -7.6%       0.42        perf-stat.overall.ipc
>  6.121e+10            -6.8%  5.705e+10        perf-stat.ps.branch-instructions
>  6.054e+08           +10.5%  6.689e+08        perf-stat.ps.cache-misses
>  9.747e+08            -4.9%  9.266e+08        perf-stat.ps.cache-references
>       9124            -5.6%       8613        perf-stat.ps.context-switches
>     583.66            -4.9%     555.21        perf-stat.ps.cpu-migrations
>  1.919e+11            -6.8%  1.789e+11        perf-stat.ps.instructions
>      42389            -6.7%      39549        perf-stat.ps.minor-faults
>      42389            -6.7%      39549        perf-stat.ps.page-faults
>  5.812e+13            -6.5%  5.434e+13        perf-stat.total.instructions
>      40.26           -40.3        0.00        perf-profile.calltrace.cycles-pp.clear_subpage.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
>      40.76            -2.1       38.66        perf-profile.calltrace.cycles-pp.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>      42.59            -2.0       40.61        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
>      42.54            -2.0       40.57        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
>      42.54            -2.0       40.57        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
>      42.40            -2.0       40.43        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
>      42.32            -2.0       40.36        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>      42.23            -2.0       40.27        perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>      41.70            -2.0       39.74        perf-profile.calltrace.cycles-pp.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>       0.76            -0.0        0.72        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>       0.72            -0.0        0.68        perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page
>       0.67            -0.0        0.64        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd
>       0.72            -0.0        0.69        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
>       0.56            -0.0        0.54        perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
>       0.00            +0.8        0.76 ±  2%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
>      30.25            +1.2       31.46        perf-profile.calltrace.cycles-pp.do_rw_once
>      40.49           -40.5        0.00        perf-profile.children.cycles-pp.clear_subpage
>      42.61            -2.0       40.63        perf-profile.children.cycles-pp.asm_exc_page_fault
>      42.55            -2.0       40.58        perf-profile.children.cycles-pp.exc_page_fault
>      42.54            -2.0       40.57        perf-profile.children.cycles-pp.do_user_addr_fault
>      42.40            -2.0       40.43        perf-profile.children.cycles-pp.handle_mm_fault
>      42.33            -2.0       40.36        perf-profile.children.cycles-pp.__handle_mm_fault
>      42.23            -2.0       40.27        perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
>      41.70            -2.0       39.74        perf-profile.children.cycles-pp.vma_alloc_anon_folio_pmd
>      40.83            -1.9       38.92        perf-profile.children.cycles-pp.folio_zero_user
>      63.93            -1.2       62.77        perf-profile.children.cycles-pp.do_access
>       0.95            -0.0        0.91        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
>       0.78            -0.0        0.74        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
>       0.95            -0.0        0.92        perf-profile.children.cycles-pp.alloc_pages_mpol
>       0.79            -0.0        0.76        perf-profile.children.cycles-pp.get_page_from_freelist
>       0.63            -0.0        0.60        perf-profile.children.cycles-pp.prep_new_page
>      40.31            +2.5       42.80        perf-profile.children.cycles-pp.do_rw_once
>      39.77           -39.8        0.00        perf-profile.self.cycles-pp.clear_subpage
>       9.54            -0.3        9.23        perf-profile.self.cycles-pp.do_access
>       0.55            -0.0        0.53        perf-profile.self.cycles-pp.prep_new_page
>      38.35            +2.6       40.96        perf-profile.self.cycles-pp.do_rw_once
>       0.36 ±  2%     +38.0       38.32        perf-profile.self.cycles-pp.folio_zero_user
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-03-11 19:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10  6:39 [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression kernel test robot
2026-03-11 19:04 ` Ankur Arora

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox