Re: [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Ankur Arora <ankur.a.arora@oracle.com>
To: kernel test robot <oliver.sang@intel.com>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Raghavendra K T <raghavendra.kt@amd.com>,
	David Hildenbrand <david@kernel.org>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Borislav Petkov (AMD)" <bp@alien8.de>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Konrad Rzessutek Wilk <konrad.wilk@oracle.com>,
	Lance Yang <ioworker0@gmail.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Li Zhe <lizhe.67@bytedance.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Mateusz Guzik <mjguzik@gmail.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org
Subject: Re: [linus:master] [mm]  9890ecab6a:  vm-scalability.throughput 3.8% regression
Date: Wed, 11 Mar 2026 12:04:32 -0700	[thread overview]
Message-ID: <874immur0f.fsf@oracle.com> (raw)
In-Reply-To: <202603101342.297fb270-lkp@intel.com>


kernel test robot <oliver.sang@intel.com> writes:

> Hello,
>
> kernel test robot noticed a 3.8% regression of vm-scalability.throughput on:
>

[ ... ]

> testcase: vm-scalability
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
>
> 	runtime: 300s
> 	size: 8T
> 	test: anon-w-seq-mt
> 	cpufreq_governor: performance

This test exercises the THP sequential zeroing path.

>      15142            -4.2%      14512        vm-scalability.time.system_time
>      16939            +6.1%      17975        vm-scalability.time.user_time

stime drops because folio_zero_user() is more efficient. But utime goes
up because of higher user miss rate since folio_zero_user() is now clearing
sequentially instead of the earlier left right fashion:

>      61.69            +9.8       71.51        perf-stat.i.cache-miss-rate%
>  6.147e+08           +10.7%  6.805e+08        perf-stat.i.cache-misses
>  9.904e+08            -4.7%  9.436e+08        perf-stat.i.cache-references
>       2.17            +8.2%       2.35        perf-stat.i.cpi


I had noted similar behaviour with anon-w-seq-hugetlb in 93552c9a3350:

   vm-scalability/anon-w-seq-hugetlb: this workload runs with 384 processes
   (one for each CPU) each zeroing anonymously mapped hugetlb memory which
   is then accessed sequentially.

                            stime                utime

   discontiguous-page      1739.93 ( +- 6.15% )  1016.61 ( +- 4.75% )
   contiguous-page         1853.70 ( +- 2.51% )  1187.13 ( +- 3.50% )
   batched-pages           1756.75 ( +- 2.98% )  1133.32 ( +- 4.89% )
   neighbourhood-last      1725.18 ( +- 4.59% )  1123.78 ( +- 7.38% )

  Both stime and utime largely respond somewhat expectedly. There is a
  fair amount of run to run variation but the general trend is that the
  stime drops and utime increases. There are a few oddities, like
  contiguous-page performing very differently from batched-pages.

  As such this is likely an uncommon pattern where we saturate the memory
  bandwidth (since all CPUs are running the test) and at the same time
  are cache constrained because we access the entire region.


Ankur

> cb431accb36e51b6 9890ecab6ad9c0d3d342469f3b6
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>       0.08 ±  3%      +8.7%       0.09 ±  3%  vm-scalability.free_time
>     357969            -6.6%     334511        vm-scalability.median
>  1.034e+08            -3.8%   99382138        vm-scalability.throughput
>     634243           -13.6%     548120 ±  6%  vm-scalability.time.involuntary_context_switches
>   12706518            -6.6%   11872543        vm-scalability.time.minor_page_faults
>      15142            -4.2%      14512        vm-scalability.time.system_time
>      16939            +6.1%      17975        vm-scalability.time.user_time
>     251227            -6.8%     234071        vm-scalability.time.voluntary_context_switches
>  1.791e+10            -6.6%  1.674e+10        vm-scalability.workload
>       0.30            -7.5%       0.28        turbostat.IPC
>       9203            -5.5%       8693        vmstat.system.cs
>       0.08            +0.0        0.08        mpstat.cpu.all.soft%
>      25.14            +1.5       26.62        mpstat.cpu.all.usr%
>       3.13           +18.3%       3.71        perf-stat.i.MPKI
>   6.22e+10            -6.6%   5.81e+10        perf-stat.i.branch-instructions
>      61.69            +9.8       71.51        perf-stat.i.cache-miss-rate%
>  6.147e+08           +10.7%  6.805e+08        perf-stat.i.cache-misses
>  9.904e+08            -4.7%  9.436e+08        perf-stat.i.cache-references
>       9303            -5.2%       8823        perf-stat.i.context-switches
>       2.17            +8.2%       2.35        perf-stat.i.cpi
>     598.97            -4.6%     571.28        perf-stat.i.cpu-migrations
>   1.95e+11            -6.6%  1.822e+11        perf-stat.i.instructions
>       0.47            -7.2%       0.43        perf-stat.i.ipc
>      43153            -6.5%      40334        perf-stat.i.minor-faults
>      43153            -6.5%      40335        perf-stat.i.page-faults
>       3.16           +18.5%       3.74        perf-stat.overall.MPKI
>       0.02            +0.0        0.03        perf-stat.overall.branch-miss-rate%
>      62.11           +10.1       72.19        perf-stat.overall.cache-miss-rate%
>       2.19            +8.3%       2.37        perf-stat.overall.cpi
>     692.89            -8.6%     633.07        perf-stat.overall.cycles-between-cache-misses
>       0.46            -7.6%       0.42        perf-stat.overall.ipc
>  6.121e+10            -6.8%  5.705e+10        perf-stat.ps.branch-instructions
>  6.054e+08           +10.5%  6.689e+08        perf-stat.ps.cache-misses
>  9.747e+08            -4.9%  9.266e+08        perf-stat.ps.cache-references
>       9124            -5.6%       8613        perf-stat.ps.context-switches
>     583.66            -4.9%     555.21        perf-stat.ps.cpu-migrations
>  1.919e+11            -6.8%  1.789e+11        perf-stat.ps.instructions
>      42389            -6.7%      39549        perf-stat.ps.minor-faults
>      42389            -6.7%      39549        perf-stat.ps.page-faults
>  5.812e+13            -6.5%  5.434e+13        perf-stat.total.instructions
>      40.26           -40.3        0.00        perf-profile.calltrace.cycles-pp.clear_subpage.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
>      40.76            -2.1       38.66        perf-profile.calltrace.cycles-pp.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>      42.59            -2.0       40.61        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
>      42.54            -2.0       40.57        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
>      42.54            -2.0       40.57        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
>      42.40            -2.0       40.43        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
>      42.32            -2.0       40.36        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>      42.23            -2.0       40.27        perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>      41.70            -2.0       39.74        perf-profile.calltrace.cycles-pp.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>       0.76            -0.0        0.72        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>       0.72            -0.0        0.68        perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page
>       0.67            -0.0        0.64        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd
>       0.72            -0.0        0.69        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
>       0.56            -0.0        0.54        perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
>       0.00            +0.8        0.76 ±  2%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
>      30.25            +1.2       31.46        perf-profile.calltrace.cycles-pp.do_rw_once
>      40.49           -40.5        0.00        perf-profile.children.cycles-pp.clear_subpage
>      42.61            -2.0       40.63        perf-profile.children.cycles-pp.asm_exc_page_fault
>      42.55            -2.0       40.58        perf-profile.children.cycles-pp.exc_page_fault
>      42.54            -2.0       40.57        perf-profile.children.cycles-pp.do_user_addr_fault
>      42.40            -2.0       40.43        perf-profile.children.cycles-pp.handle_mm_fault
>      42.33            -2.0       40.36        perf-profile.children.cycles-pp.__handle_mm_fault
>      42.23            -2.0       40.27        perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
>      41.70            -2.0       39.74        perf-profile.children.cycles-pp.vma_alloc_anon_folio_pmd
>      40.83            -1.9       38.92        perf-profile.children.cycles-pp.folio_zero_user
>      63.93            -1.2       62.77        perf-profile.children.cycles-pp.do_access
>       0.95            -0.0        0.91        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
>       0.78            -0.0        0.74        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
>       0.95            -0.0        0.92        perf-profile.children.cycles-pp.alloc_pages_mpol
>       0.79            -0.0        0.76        perf-profile.children.cycles-pp.get_page_from_freelist
>       0.63            -0.0        0.60        perf-profile.children.cycles-pp.prep_new_page
>      40.31            +2.5       42.80        perf-profile.children.cycles-pp.do_rw_once
>      39.77           -39.8        0.00        perf-profile.self.cycles-pp.clear_subpage
>       9.54            -0.3        9.23        perf-profile.self.cycles-pp.do_access
>       0.55            -0.0        0.53        perf-profile.self.cycles-pp.prep_new_page
>      38.35            +2.6       40.96        perf-profile.self.cycles-pp.do_rw_once
>       0.36 ±  2%     +38.0       38.32        perf-profile.self.cycles-pp.folio_zero_user
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.

     prev parent reply	other threads:[~2026-03-11 19:05 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10  6:39 [linus:master] [mm] 9890ecab6a: vm-scalability.throughput 3.8% regression kernel test robot
2026-03-11 19:04 ` Ankur Arora [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874immur0f.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=david@kernel.org \
    --cc=hpa@zytor.com \
    --cc=ioworker0@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizhe.67@bytedance.com \
    --cc=lkp@intel.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox