From: Ankur Arora <ankur.a.arora@oracle.com>
To: kernel test robot <oliver.sang@intel.com>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Raghavendra K T <raghavendra.kt@amd.com>,
David Hildenbrand <david@kernel.org>,
"Andy Lutomirski" <luto@kernel.org>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Konrad Rzessutek Wilk <konrad.wilk@oracle.com>,
Lance Yang <ioworker0@gmail.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Li Zhe <lizhe.67@bytedance.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Mateusz Guzik <mjguzik@gmail.com>,
"Matthew Wilcox" <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Suren Baghdasaryan <surenb@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org
Subject: Re: [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression
Date: Wed, 11 Mar 2026 12:04:32 -0700 [thread overview]
Message-ID: <874immur0f.fsf@oracle.com> (raw)
In-Reply-To: <202603101342.297fb270-lkp@intel.com>
kernel test robot <oliver.sang@intel.com> writes:
> Hello,
>
> kernel test robot noticed a 3.8% regression of vm-scalability.throughput on:
>
[ ... ]
> testcase: vm-scalability
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
>
> runtime: 300s
> size: 8T
> test: anon-w-seq-mt
> cpufreq_governor: performance
This test exercises the THP sequential zeroing path.
> 15142 -4.2% 14512 vm-scalability.time.system_time
> 16939 +6.1% 17975 vm-scalability.time.user_time
stime drops because folio_zero_user() is more efficient. But utime goes
up because of higher user miss rate since folio_zero_user() is now clearing
sequentially instead of the earlier left right fashion:
> 61.69 +9.8 71.51 perf-stat.i.cache-miss-rate%
> 6.147e+08 +10.7% 6.805e+08 perf-stat.i.cache-misses
> 9.904e+08 -4.7% 9.436e+08 perf-stat.i.cache-references
> 2.17 +8.2% 2.35 perf-stat.i.cpi
I had noted similar behaviour with anon-w-seq-hugetlb in 93552c9a3350:
vm-scalability/anon-w-seq-hugetlb: this workload runs with 384 processes
(one for each CPU) each zeroing anonymously mapped hugetlb memory which
is then accessed sequentially.
stime utime
discontiguous-page 1739.93 ( +- 6.15% ) 1016.61 ( +- 4.75% )
contiguous-page 1853.70 ( +- 2.51% ) 1187.13 ( +- 3.50% )
batched-pages 1756.75 ( +- 2.98% ) 1133.32 ( +- 4.89% )
neighbourhood-last 1725.18 ( +- 4.59% ) 1123.78 ( +- 7.38% )
Both stime and utime largely respond somewhat expectedly. There is a
fair amount of run to run variation but the general trend is that the
stime drops and utime increases. There are a few oddities, like
contiguous-page performing very differently from batched-pages.
As such this is likely an uncommon pattern where we saturate the memory
bandwidth (since all CPUs are running the test) and at the same time
are cache constrained because we access the entire region.
Ankur
> cb431accb36e51b6 9890ecab6ad9c0d3d342469f3b6
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 0.08 ± 3% +8.7% 0.09 ± 3% vm-scalability.free_time
> 357969 -6.6% 334511 vm-scalability.median
> 1.034e+08 -3.8% 99382138 vm-scalability.throughput
> 634243 -13.6% 548120 ± 6% vm-scalability.time.involuntary_context_switches
> 12706518 -6.6% 11872543 vm-scalability.time.minor_page_faults
> 15142 -4.2% 14512 vm-scalability.time.system_time
> 16939 +6.1% 17975 vm-scalability.time.user_time
> 251227 -6.8% 234071 vm-scalability.time.voluntary_context_switches
> 1.791e+10 -6.6% 1.674e+10 vm-scalability.workload
> 0.30 -7.5% 0.28 turbostat.IPC
> 9203 -5.5% 8693 vmstat.system.cs
> 0.08 +0.0 0.08 mpstat.cpu.all.soft%
> 25.14 +1.5 26.62 mpstat.cpu.all.usr%
> 3.13 +18.3% 3.71 perf-stat.i.MPKI
> 6.22e+10 -6.6% 5.81e+10 perf-stat.i.branch-instructions
> 61.69 +9.8 71.51 perf-stat.i.cache-miss-rate%
> 6.147e+08 +10.7% 6.805e+08 perf-stat.i.cache-misses
> 9.904e+08 -4.7% 9.436e+08 perf-stat.i.cache-references
> 9303 -5.2% 8823 perf-stat.i.context-switches
> 2.17 +8.2% 2.35 perf-stat.i.cpi
> 598.97 -4.6% 571.28 perf-stat.i.cpu-migrations
> 1.95e+11 -6.6% 1.822e+11 perf-stat.i.instructions
> 0.47 -7.2% 0.43 perf-stat.i.ipc
> 43153 -6.5% 40334 perf-stat.i.minor-faults
> 43153 -6.5% 40335 perf-stat.i.page-faults
> 3.16 +18.5% 3.74 perf-stat.overall.MPKI
> 0.02 +0.0 0.03 perf-stat.overall.branch-miss-rate%
> 62.11 +10.1 72.19 perf-stat.overall.cache-miss-rate%
> 2.19 +8.3% 2.37 perf-stat.overall.cpi
> 692.89 -8.6% 633.07 perf-stat.overall.cycles-between-cache-misses
> 0.46 -7.6% 0.42 perf-stat.overall.ipc
> 6.121e+10 -6.8% 5.705e+10 perf-stat.ps.branch-instructions
> 6.054e+08 +10.5% 6.689e+08 perf-stat.ps.cache-misses
> 9.747e+08 -4.9% 9.266e+08 perf-stat.ps.cache-references
> 9124 -5.6% 8613 perf-stat.ps.context-switches
> 583.66 -4.9% 555.21 perf-stat.ps.cpu-migrations
> 1.919e+11 -6.8% 1.789e+11 perf-stat.ps.instructions
> 42389 -6.7% 39549 perf-stat.ps.minor-faults
> 42389 -6.7% 39549 perf-stat.ps.page-faults
> 5.812e+13 -6.5% 5.434e+13 perf-stat.total.instructions
> 40.26 -40.3 0.00 perf-profile.calltrace.cycles-pp.clear_subpage.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
> 40.76 -2.1 38.66 perf-profile.calltrace.cycles-pp.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
> 42.59 -2.0 40.61 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
> 42.54 -2.0 40.57 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
> 42.54 -2.0 40.57 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
> 42.40 -2.0 40.43 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
> 42.32 -2.0 40.36 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 42.23 -2.0 40.27 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
> 41.70 -2.0 39.74 perf-profile.calltrace.cycles-pp.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 0.76 -0.0 0.72 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
> 0.72 -0.0 0.68 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page
> 0.67 -0.0 0.64 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd
> 0.72 -0.0 0.69 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
> 0.56 -0.0 0.54 perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
> 0.00 +0.8 0.76 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
> 30.25 +1.2 31.46 perf-profile.calltrace.cycles-pp.do_rw_once
> 40.49 -40.5 0.00 perf-profile.children.cycles-pp.clear_subpage
> 42.61 -2.0 40.63 perf-profile.children.cycles-pp.asm_exc_page_fault
> 42.55 -2.0 40.58 perf-profile.children.cycles-pp.exc_page_fault
> 42.54 -2.0 40.57 perf-profile.children.cycles-pp.do_user_addr_fault
> 42.40 -2.0 40.43 perf-profile.children.cycles-pp.handle_mm_fault
> 42.33 -2.0 40.36 perf-profile.children.cycles-pp.__handle_mm_fault
> 42.23 -2.0 40.27 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
> 41.70 -2.0 39.74 perf-profile.children.cycles-pp.vma_alloc_anon_folio_pmd
> 40.83 -1.9 38.92 perf-profile.children.cycles-pp.folio_zero_user
> 63.93 -1.2 62.77 perf-profile.children.cycles-pp.do_access
> 0.95 -0.0 0.91 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
> 0.78 -0.0 0.74 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
> 0.95 -0.0 0.92 perf-profile.children.cycles-pp.alloc_pages_mpol
> 0.79 -0.0 0.76 perf-profile.children.cycles-pp.get_page_from_freelist
> 0.63 -0.0 0.60 perf-profile.children.cycles-pp.prep_new_page
> 40.31 +2.5 42.80 perf-profile.children.cycles-pp.do_rw_once
> 39.77 -39.8 0.00 perf-profile.self.cycles-pp.clear_subpage
> 9.54 -0.3 9.23 perf-profile.self.cycles-pp.do_access
> 0.55 -0.0 0.53 perf-profile.self.cycles-pp.prep_new_page
> 38.35 +2.6 40.96 perf-profile.self.cycles-pp.do_rw_once
> 0.36 ± 2% +38.0 38.32 perf-profile.self.cycles-pp.folio_zero_user
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
prev parent reply other threads:[~2026-03-11 19:05 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 6:39 [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression kernel test robot
2026-03-11 19:04 ` Ankur Arora [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874immur0f.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=david@kernel.org \
--cc=hpa@zytor.com \
--cc=ioworker0@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizhe.67@bytedance.com \
--cc=lkp@intel.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox