* [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression
@ 2026-03-10 6:39 kernel test robot
2026-03-11 19:04 ` Ankur Arora
0 siblings, 1 reply; 2+ messages in thread
From: kernel test robot @ 2026-03-10 6:39 UTC (permalink / raw)
To: Ankur Arora
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Raghavendra K T,
David Hildenbrand, Andy Lutomirski, Borislav Petkov (AMD),
Boris Ostrovsky, H. Peter Anvin, Ingo Molnar,
Konrad Rzessutek Wilk, Lance Yang, Liam R. Howlett, Li Zhe,
Lorenzo Stoakes, Mateusz Guzik, Matthew Wilcox, Michal Hocko,
Mike Rapoport, Peter Zijlstra, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, linux-mm, oliver.sang
Hello,
kernel test robot noticed a 3.8% regression of vm-scalability.throughput on:
commit: 9890ecab6ad9c0d3d342469f3b619fd704b5c59a ("mm: folio_zero_user: clear pages sequentially")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[still regression on linus/master 0031c06807cfa8aa51a759ff8aa09e1aa48149af]
[still regression on linux-next/master c025f6cf4209e1542ec2afebe49f42bbaf1a5c7b]
testcase: vm-scalability
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:
runtime: 300s
size: 8T
test: anon-w-seq-mt
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202603101342.297fb270-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260310/202603101342.297fb270-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-14/performance/x86_64-rhel-9.4/debian-13-x86_64-20250902.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability
commit:
cb431accb3 ("x86/clear_page: introduce clear_pages()")
9890ecab6a ("mm: folio_zero_user: clear pages sequentially")
cb431accb36e51b6 9890ecab6ad9c0d3d342469f3b6
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.08 ± 3% +8.7% 0.09 ± 3% vm-scalability.free_time
357969 -6.6% 334511 vm-scalability.median
1.034e+08 -3.8% 99382138 vm-scalability.throughput
634243 -13.6% 548120 ± 6% vm-scalability.time.involuntary_context_switches
12706518 -6.6% 11872543 vm-scalability.time.minor_page_faults
15142 -4.2% 14512 vm-scalability.time.system_time
16939 +6.1% 17975 vm-scalability.time.user_time
251227 -6.8% 234071 vm-scalability.time.voluntary_context_switches
1.791e+10 -6.6% 1.674e+10 vm-scalability.workload
0.30 -7.5% 0.28 turbostat.IPC
9203 -5.5% 8693 vmstat.system.cs
0.08 +0.0 0.08 mpstat.cpu.all.soft%
25.14 +1.5 26.62 mpstat.cpu.all.usr%
3.13 +18.3% 3.71 perf-stat.i.MPKI
6.22e+10 -6.6% 5.81e+10 perf-stat.i.branch-instructions
61.69 +9.8 71.51 perf-stat.i.cache-miss-rate%
6.147e+08 +10.7% 6.805e+08 perf-stat.i.cache-misses
9.904e+08 -4.7% 9.436e+08 perf-stat.i.cache-references
9303 -5.2% 8823 perf-stat.i.context-switches
2.17 +8.2% 2.35 perf-stat.i.cpi
598.97 -4.6% 571.28 perf-stat.i.cpu-migrations
1.95e+11 -6.6% 1.822e+11 perf-stat.i.instructions
0.47 -7.2% 0.43 perf-stat.i.ipc
43153 -6.5% 40334 perf-stat.i.minor-faults
43153 -6.5% 40335 perf-stat.i.page-faults
3.16 +18.5% 3.74 perf-stat.overall.MPKI
0.02 +0.0 0.03 perf-stat.overall.branch-miss-rate%
62.11 +10.1 72.19 perf-stat.overall.cache-miss-rate%
2.19 +8.3% 2.37 perf-stat.overall.cpi
692.89 -8.6% 633.07 perf-stat.overall.cycles-between-cache-misses
0.46 -7.6% 0.42 perf-stat.overall.ipc
6.121e+10 -6.8% 5.705e+10 perf-stat.ps.branch-instructions
6.054e+08 +10.5% 6.689e+08 perf-stat.ps.cache-misses
9.747e+08 -4.9% 9.266e+08 perf-stat.ps.cache-references
9124 -5.6% 8613 perf-stat.ps.context-switches
583.66 -4.9% 555.21 perf-stat.ps.cpu-migrations
1.919e+11 -6.8% 1.789e+11 perf-stat.ps.instructions
42389 -6.7% 39549 perf-stat.ps.minor-faults
42389 -6.7% 39549 perf-stat.ps.page-faults
5.812e+13 -6.5% 5.434e+13 perf-stat.total.instructions
40.26 -40.3 0.00 perf-profile.calltrace.cycles-pp.clear_subpage.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
40.76 -2.1 38.66 perf-profile.calltrace.cycles-pp.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
42.59 -2.0 40.61 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
42.54 -2.0 40.57 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
42.54 -2.0 40.57 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
42.40 -2.0 40.43 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
42.32 -2.0 40.36 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
42.23 -2.0 40.27 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
41.70 -2.0 39.74 perf-profile.calltrace.cycles-pp.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.76 -0.0 0.72 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.72 -0.0 0.68 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page
0.67 -0.0 0.64 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd
0.72 -0.0 0.69 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
0.56 -0.0 0.54 perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
0.00 +0.8 0.76 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
30.25 +1.2 31.46 perf-profile.calltrace.cycles-pp.do_rw_once
40.49 -40.5 0.00 perf-profile.children.cycles-pp.clear_subpage
42.61 -2.0 40.63 perf-profile.children.cycles-pp.asm_exc_page_fault
42.55 -2.0 40.58 perf-profile.children.cycles-pp.exc_page_fault
42.54 -2.0 40.57 perf-profile.children.cycles-pp.do_user_addr_fault
42.40 -2.0 40.43 perf-profile.children.cycles-pp.handle_mm_fault
42.33 -2.0 40.36 perf-profile.children.cycles-pp.__handle_mm_fault
42.23 -2.0 40.27 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
41.70 -2.0 39.74 perf-profile.children.cycles-pp.vma_alloc_anon_folio_pmd
40.83 -1.9 38.92 perf-profile.children.cycles-pp.folio_zero_user
63.93 -1.2 62.77 perf-profile.children.cycles-pp.do_access
0.95 -0.0 0.91 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
0.78 -0.0 0.74 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
0.95 -0.0 0.92 perf-profile.children.cycles-pp.alloc_pages_mpol
0.79 -0.0 0.76 perf-profile.children.cycles-pp.get_page_from_freelist
0.63 -0.0 0.60 perf-profile.children.cycles-pp.prep_new_page
40.31 +2.5 42.80 perf-profile.children.cycles-pp.do_rw_once
39.77 -39.8 0.00 perf-profile.self.cycles-pp.clear_subpage
9.54 -0.3 9.23 perf-profile.self.cycles-pp.do_access
0.55 -0.0 0.53 perf-profile.self.cycles-pp.prep_new_page
38.35 +2.6 40.96 perf-profile.self.cycles-pp.do_rw_once
0.36 ± 2% +38.0 38.32 perf-profile.self.cycles-pp.folio_zero_user
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression
2026-03-10 6:39 [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression kernel test robot
@ 2026-03-11 19:04 ` Ankur Arora
0 siblings, 0 replies; 2+ messages in thread
From: Ankur Arora @ 2026-03-11 19:04 UTC (permalink / raw)
To: kernel test robot
Cc: Ankur Arora, oe-lkp, lkp, linux-kernel, Andrew Morton,
Raghavendra K T, David Hildenbrand, Andy Lutomirski,
Borislav Petkov (AMD), Boris Ostrovsky, H. Peter Anvin,
Ingo Molnar, Konrad Rzessutek Wilk, Lance Yang, Liam R. Howlett,
Li Zhe, Lorenzo Stoakes, Mateusz Guzik, Matthew Wilcox,
Michal Hocko, Mike Rapoport, Peter Zijlstra, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, linux-mm
kernel test robot <oliver.sang@intel.com> writes:
> Hello,
>
> kernel test robot noticed a 3.8% regression of vm-scalability.throughput on:
>
[ ... ]
> testcase: vm-scalability
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
>
> runtime: 300s
> size: 8T
> test: anon-w-seq-mt
> cpufreq_governor: performance
This test exercises the THP sequential zeroing path.
> 15142 -4.2% 14512 vm-scalability.time.system_time
> 16939 +6.1% 17975 vm-scalability.time.user_time
stime drops because folio_zero_user() is more efficient. But utime goes
up because of higher user miss rate since folio_zero_user() is now clearing
sequentially instead of the earlier left right fashion:
> 61.69 +9.8 71.51 perf-stat.i.cache-miss-rate%
> 6.147e+08 +10.7% 6.805e+08 perf-stat.i.cache-misses
> 9.904e+08 -4.7% 9.436e+08 perf-stat.i.cache-references
> 2.17 +8.2% 2.35 perf-stat.i.cpi
I had noted similar behaviour with anon-w-seq-hugetlb in 93552c9a3350:
vm-scalability/anon-w-seq-hugetlb: this workload runs with 384 processes
(one for each CPU) each zeroing anonymously mapped hugetlb memory which
is then accessed sequentially.
stime utime
discontiguous-page 1739.93 ( +- 6.15% ) 1016.61 ( +- 4.75% )
contiguous-page 1853.70 ( +- 2.51% ) 1187.13 ( +- 3.50% )
batched-pages 1756.75 ( +- 2.98% ) 1133.32 ( +- 4.89% )
neighbourhood-last 1725.18 ( +- 4.59% ) 1123.78 ( +- 7.38% )
Both stime and utime largely respond somewhat expectedly. There is a
fair amount of run to run variation but the general trend is that the
stime drops and utime increases. There are a few oddities, like
contiguous-page performing very differently from batched-pages.
As such this is likely an uncommon pattern where we saturate the memory
bandwidth (since all CPUs are running the test) and at the same time
are cache constrained because we access the entire region.
Ankur
> cb431accb36e51b6 9890ecab6ad9c0d3d342469f3b6
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 0.08 ± 3% +8.7% 0.09 ± 3% vm-scalability.free_time
> 357969 -6.6% 334511 vm-scalability.median
> 1.034e+08 -3.8% 99382138 vm-scalability.throughput
> 634243 -13.6% 548120 ± 6% vm-scalability.time.involuntary_context_switches
> 12706518 -6.6% 11872543 vm-scalability.time.minor_page_faults
> 15142 -4.2% 14512 vm-scalability.time.system_time
> 16939 +6.1% 17975 vm-scalability.time.user_time
> 251227 -6.8% 234071 vm-scalability.time.voluntary_context_switches
> 1.791e+10 -6.6% 1.674e+10 vm-scalability.workload
> 0.30 -7.5% 0.28 turbostat.IPC
> 9203 -5.5% 8693 vmstat.system.cs
> 0.08 +0.0 0.08 mpstat.cpu.all.soft%
> 25.14 +1.5 26.62 mpstat.cpu.all.usr%
> 3.13 +18.3% 3.71 perf-stat.i.MPKI
> 6.22e+10 -6.6% 5.81e+10 perf-stat.i.branch-instructions
> 61.69 +9.8 71.51 perf-stat.i.cache-miss-rate%
> 6.147e+08 +10.7% 6.805e+08 perf-stat.i.cache-misses
> 9.904e+08 -4.7% 9.436e+08 perf-stat.i.cache-references
> 9303 -5.2% 8823 perf-stat.i.context-switches
> 2.17 +8.2% 2.35 perf-stat.i.cpi
> 598.97 -4.6% 571.28 perf-stat.i.cpu-migrations
> 1.95e+11 -6.6% 1.822e+11 perf-stat.i.instructions
> 0.47 -7.2% 0.43 perf-stat.i.ipc
> 43153 -6.5% 40334 perf-stat.i.minor-faults
> 43153 -6.5% 40335 perf-stat.i.page-faults
> 3.16 +18.5% 3.74 perf-stat.overall.MPKI
> 0.02 +0.0 0.03 perf-stat.overall.branch-miss-rate%
> 62.11 +10.1 72.19 perf-stat.overall.cache-miss-rate%
> 2.19 +8.3% 2.37 perf-stat.overall.cpi
> 692.89 -8.6% 633.07 perf-stat.overall.cycles-between-cache-misses
> 0.46 -7.6% 0.42 perf-stat.overall.ipc
> 6.121e+10 -6.8% 5.705e+10 perf-stat.ps.branch-instructions
> 6.054e+08 +10.5% 6.689e+08 perf-stat.ps.cache-misses
> 9.747e+08 -4.9% 9.266e+08 perf-stat.ps.cache-references
> 9124 -5.6% 8613 perf-stat.ps.context-switches
> 583.66 -4.9% 555.21 perf-stat.ps.cpu-migrations
> 1.919e+11 -6.8% 1.789e+11 perf-stat.ps.instructions
> 42389 -6.7% 39549 perf-stat.ps.minor-faults
> 42389 -6.7% 39549 perf-stat.ps.page-faults
> 5.812e+13 -6.5% 5.434e+13 perf-stat.total.instructions
> 40.26 -40.3 0.00 perf-profile.calltrace.cycles-pp.clear_subpage.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
> 40.76 -2.1 38.66 perf-profile.calltrace.cycles-pp.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
> 42.59 -2.0 40.61 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
> 42.54 -2.0 40.57 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
> 42.54 -2.0 40.57 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
> 42.40 -2.0 40.43 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
> 42.32 -2.0 40.36 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 42.23 -2.0 40.27 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
> 41.70 -2.0 39.74 perf-profile.calltrace.cycles-pp.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 0.76 -0.0 0.72 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
> 0.72 -0.0 0.68 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page
> 0.67 -0.0 0.64 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd
> 0.72 -0.0 0.69 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
> 0.56 -0.0 0.54 perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
> 0.00 +0.8 0.76 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
> 30.25 +1.2 31.46 perf-profile.calltrace.cycles-pp.do_rw_once
> 40.49 -40.5 0.00 perf-profile.children.cycles-pp.clear_subpage
> 42.61 -2.0 40.63 perf-profile.children.cycles-pp.asm_exc_page_fault
> 42.55 -2.0 40.58 perf-profile.children.cycles-pp.exc_page_fault
> 42.54 -2.0 40.57 perf-profile.children.cycles-pp.do_user_addr_fault
> 42.40 -2.0 40.43 perf-profile.children.cycles-pp.handle_mm_fault
> 42.33 -2.0 40.36 perf-profile.children.cycles-pp.__handle_mm_fault
> 42.23 -2.0 40.27 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
> 41.70 -2.0 39.74 perf-profile.children.cycles-pp.vma_alloc_anon_folio_pmd
> 40.83 -1.9 38.92 perf-profile.children.cycles-pp.folio_zero_user
> 63.93 -1.2 62.77 perf-profile.children.cycles-pp.do_access
> 0.95 -0.0 0.91 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
> 0.78 -0.0 0.74 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
> 0.95 -0.0 0.92 perf-profile.children.cycles-pp.alloc_pages_mpol
> 0.79 -0.0 0.76 perf-profile.children.cycles-pp.get_page_from_freelist
> 0.63 -0.0 0.60 perf-profile.children.cycles-pp.prep_new_page
> 40.31 +2.5 42.80 perf-profile.children.cycles-pp.do_rw_once
> 39.77 -39.8 0.00 perf-profile.self.cycles-pp.clear_subpage
> 9.54 -0.3 9.23 perf-profile.self.cycles-pp.do_access
> 0.55 -0.0 0.53 perf-profile.self.cycles-pp.prep_new_page
> 38.35 +2.6 40.96 perf-profile.self.cycles-pp.do_rw_once
> 0.36 ± 2% +38.0 38.32 perf-profile.self.cycles-pp.folio_zero_user
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-03-11 19:05 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 6:39 [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression kernel test robot
2026-03-11 19:04 ` Ankur Arora
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox