All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: kernel test robot <oliver.sang@intel.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Baoquan He <bhe@redhat.com>,
	Alexander Potapenko <glider@google.com>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Marco Elver <elver@google.com>, Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org
Subject: Re: [linus:master] [mm/vmalloc]  9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression
Date: Mon, 15 Dec 2025 13:19:14 +0100	[thread overview]
Message-ID: <aT_8woTbtklin3Bh@milan> (raw)
In-Reply-To: <202512121138.986f6a6b-lkp@intel.com>

On Fri, Dec 12, 2025 at 11:27:27AM +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on:
> 
> 
> commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> [still regression on linus/master      c9b47175e9131118e6f221cc8fb81397d62e7c91]
> [still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112]
> 
> testcase: stress-ng
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P  CPU @ 2.4GHz (Granite Rapids) with 256G memory
> parameters:
> 
> 	nr_threads: 100%
> 	testtime: 60s
> 	test: bigheap
> 	cpufreq_governor: performance
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20251212/202512121138.986f6a6b-lkp@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s
> 
> commit: 
>   86e968d8ca ("mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area()")
>   9c47753167 ("mm/vmalloc: defer freeing partly initialized vm_struct")
> 
> 86e968d8ca6dc823 9c47753167a6a585d0305663c69 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>     209109 ±  5%     -14.1%     179718 ±  6%  numa-meminfo.node0.PageTables
>    1278595 ±  7%     -10.4%    1145748 ±  2%  sched_debug.cpu.max_idle_balance_cost.max
>      33.90            -3.6%      32.67        turbostat.RAMWatt
>  3.885e+08           -10.9%  3.463e+08        numa-numastat.node0.local_node
>  3.886e+08           -10.8%  3.466e+08        numa-numastat.node0.numa_hit
>  3.881e+08           -10.9%   3.46e+08        numa-numastat.node1.local_node
>  3.883e+08           -10.9%  3.461e+08        numa-numastat.node1.numa_hit
>  3.886e+08           -10.8%  3.466e+08        numa-vmstat.node0.numa_hit
>  3.885e+08           -10.9%  3.463e+08        numa-vmstat.node0.numa_local
>  3.883e+08           -10.9%  3.461e+08        numa-vmstat.node1.numa_hit
>  3.881e+08           -10.9%   3.46e+08        numa-vmstat.node1.numa_local
>   48320196           -10.9%   43072080        stress-ng.bigheap.ops
>     785159            -9.8%     708390        stress-ng.bigheap.ops_per_sec
>     879805           -21.3%     692805        stress-ng.bigheap.realloc_calls_per_sec
>      72414            -3.3%      70043        stress-ng.time.involuntary_context_switches
>  7.735e+08           -10.9%  6.895e+08        stress-ng.time.minor_page_faults
>      15385            -1.0%      15224        stress-ng.time.system_time
>     236.00           -10.5%     211.19 ±  2%  stress-ng.time.user_time
>       0.32 ±  4%     +95.1%       0.63 ± 12%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>      16.96 ± 41%   +5031.1%     870.26 ± 40%  perf-sched.sch_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>       0.32 ±  4%     +95.1%       0.63 ± 12%  perf-sched.total_sch_delay.average.ms
>      16.96 ± 41%   +5031.1%     870.26 ± 40%  perf-sched.total_sch_delay.max.ms
>       4750 ±  4%     -12.2%       4169 ±  4%  perf-sched.total_wait_and_delay.max.ms
>       4750 ±  4%     -12.2%       4169 ±  4%  perf-sched.total_wait_time.max.ms
>       4750 ±  4%     -12.2%       4169 ±  4%  perf-sched.wait_and_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>       4750 ±  4%     -12.2%       4169 ±  4%  perf-sched.wait_time.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>   29568942            -2.9%   28712561        proc-vmstat.nr_active_anon
>   28797015            -2.8%   27991137        proc-vmstat.nr_anon_pages
>      99294            -3.7%      95669        proc-vmstat.nr_page_table_pages
>   29568950            -2.9%   28712562        proc-vmstat.nr_zone_active_anon
>   7.77e+08           -10.9%  6.927e+08        proc-vmstat.numa_hit
>  7.766e+08           -10.9%  6.923e+08        proc-vmstat.numa_local
>  7.785e+08           -10.8%  6.941e+08        proc-vmstat.pgalloc_normal
>  7.739e+08           -10.8%  6.899e+08        proc-vmstat.pgfault
>  7.756e+08           -10.6%  6.931e+08        proc-vmstat.pgfree
>       7.68            -3.8%       7.39        perf-stat.i.MPKI
>  2.811e+10            -4.9%  2.672e+10        perf-stat.i.branch-instructions
>       0.06            -0.0        0.05        perf-stat.i.branch-miss-rate%
>   15424402           -14.3%   13220241        perf-stat.i.branch-misses
>      80.75            -2.3       78.42        perf-stat.i.cache-miss-rate%
>  1.037e+09           -11.0%  9.233e+08        perf-stat.i.cache-misses
>  1.217e+09           -10.6%  1.088e+09        perf-stat.i.cache-references
>       2817 ±  2%      -2.8%       2739        perf-stat.i.context-switches
>       7.16            +5.1%       7.53        perf-stat.i.cpi
>       1846 ±  5%     +30.6%       2410 ±  5%  perf-stat.i.cycles-between-cache-misses
>  1.298e+11            -5.9%  1.222e+11        perf-stat.i.instructions
>       0.14            -5.2%       0.13        perf-stat.i.ipc
>     103.98            -9.7%      93.94        perf-stat.i.metric.K/sec
>   13534286           -11.0%   12040965        perf-stat.i.minor-faults
>   13534286           -11.0%   12040965        perf-stat.i.page-faults
>       7.64            -5.3%       7.23        perf-stat.overall.MPKI
>       0.05            -0.0        0.05        perf-stat.overall.branch-miss-rate%
>       7.20            +5.3%       7.58        perf-stat.overall.cpi
>     942.28           +11.2%       1047        perf-stat.overall.cycles-between-cache-misses
>       0.14            -5.0%       0.13        perf-stat.overall.ipc
>  2.678e+10            -4.1%  2.569e+10        perf-stat.ps.branch-instructions
>   14559650           -13.3%   12627015        perf-stat.ps.branch-misses
>  9.434e+08           -10.0%  8.491e+08        perf-stat.ps.cache-misses
>  1.112e+09            -9.5%  1.006e+09        perf-stat.ps.cache-references
>  1.235e+11            -4.9%  1.174e+11        perf-stat.ps.instructions
>   12270397           -10.0%   11048367        perf-stat.ps.minor-faults
>   12270398           -10.0%   11048367        perf-stat.ps.page-faults
>  7.755e+12            -5.9%    7.3e+12        perf-stat.total.instructions
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.__munmap
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>      42.85            -5.2       37.62        perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>      41.78 ±  2%      -5.1       36.70        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
>      41.78 ±  2%      -5.1       36.70        perf-profile.calltrace.cycles-pp.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
>      41.78 ±  2%      -5.1       36.70        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas
>      41.78 ±  2%      -5.1       36.70        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes
>      41.51 ±  2%      -5.1       36.45        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
>      41.51 ±  2%      -5.1       36.45        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
>      41.51 ±  2%      -5.1       36.45        perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
>      41.65            -5.1       36.60        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
>      41.63            -5.1       36.58        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
>      41.65            -5.1       36.60        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
>      41.46 ±  2%      -5.0       36.41        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
>      40.84 ±  2%      -4.9       35.90        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
>       3.89 ±  4%      -2.4        1.53 ±  8%  perf-profile.calltrace.cycles-pp.si_meminfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       3.84 ±  4%      -2.4        1.49 ±  8%  perf-profile.calltrace.cycles-pp.nr_blockdev_pages.si_meminfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64
>       3.82 ±  4%      -2.3        1.47 ±  9%  perf-profile.calltrace.cycles-pp._raw_spin_lock.nr_blockdev_pages.si_meminfo.do_sysinfo.__do_sys_sysinfo
>       3.74 ±  4%      -2.3        1.43 ±  9%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.nr_blockdev_pages.si_meminfo.do_sysinfo
>       3.10 ±  2%      -0.6        2.45 ±  2%  perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>       1.90            -0.4        1.52        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
>       1.84            -0.4        1.48        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
>       1.80            -0.4        1.44        perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
>       1.70            -0.4        1.36        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio
>       1.43 ±  6%      -0.3        1.12 ±  2%  perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>       1.26 ±  4%      -0.3        0.98 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.do_anonymous_page.__handle_mm_fault.handle_mm_fault
>       1.21            -0.3        0.95        perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
>       1.16 ±  8%      -0.3        0.90 ±  5%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
>       1.17            -0.3        0.92        perf-profile.calltrace.cycles-pp.clear_page_erms.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol
>      44.15 ±  2%      +7.5       51.61 ±  2%  perf-profile.calltrace.cycles-pp.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo
>      44.32 ±  2%      +7.5       51.79 ±  2%  perf-profile.calltrace.cycles-pp.sysinfo
>      44.30 ±  2%      +7.5       51.77 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo
>      44.30 ±  2%      +7.5       51.77 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sysinfo
>      44.28 ±  2%      +7.5       51.75 ±  2%  perf-profile.calltrace.cycles-pp.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo
>      40.25 ±  2%      +9.8       50.06 ±  2%  perf-profile.calltrace.cycles-pp.si_swapinfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      40.24 ±  2%      +9.8       50.06 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64
>      40.08 ±  2%      +9.8       49.92 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo
>      44.76 ±  2%      -6.0       38.80 ±  4%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>      44.44 ±  2%      -5.9       38.56 ±  4%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
>      42.85            -5.2       37.62        perf-profile.children.cycles-pp.__munmap
>      42.85            -5.2       37.62        perf-profile.children.cycles-pp.__vm_munmap
>      42.85            -5.2       37.62        perf-profile.children.cycles-pp.__x64_sys_munmap
>      42.88            -5.2       37.65        perf-profile.children.cycles-pp.do_vmi_align_munmap
>      42.88            -5.2       37.65        perf-profile.children.cycles-pp.vms_clear_ptes
>      42.88            -5.2       37.65        perf-profile.children.cycles-pp.vms_complete_munmap_vmas
>      42.86            -5.2       37.64        perf-profile.children.cycles-pp.do_vmi_munmap
>      42.62            -5.2       37.40        perf-profile.children.cycles-pp.folios_put_refs
>      42.60            -5.2       37.40        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
>      42.60            -5.2       37.40        perf-profile.children.cycles-pp.free_pages_and_swap_cache
>      41.93            -5.1       36.84        perf-profile.children.cycles-pp.__page_cache_release
>      41.80 ±  2%      -5.1       36.72        perf-profile.children.cycles-pp.unmap_page_range
>      41.80 ±  2%      -5.1       36.72        perf-profile.children.cycles-pp.unmap_vmas
>      41.80 ±  2%      -5.1       36.72        perf-profile.children.cycles-pp.zap_pmd_range
>      41.80 ±  2%      -5.1       36.72        perf-profile.children.cycles-pp.zap_pte_range
>      41.51 ±  2%      -5.1       36.45        perf-profile.children.cycles-pp.tlb_flush_mmu
>       3.89 ±  4%      -2.4        1.53 ±  8%  perf-profile.children.cycles-pp.si_meminfo
>       3.84 ±  4%      -2.4        1.49 ±  8%  perf-profile.children.cycles-pp.nr_blockdev_pages
>       3.11 ±  2%      -0.6        2.46 ±  2%  perf-profile.children.cycles-pp.alloc_anon_folio
>       1.90            -0.4        1.52        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
>       1.89            -0.4        1.52        perf-profile.children.cycles-pp.alloc_pages_mpol
>       1.84            -0.4        1.48        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
>       1.73            -0.3        1.39        perf-profile.children.cycles-pp.get_page_from_freelist
>       0.56 ± 72%      -0.3        0.22 ±108%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
>       1.45 ±  6%      -0.3        1.14 ±  3%  perf-profile.children.cycles-pp.__pte_offset_map_lock
>       1.22            -0.3        0.96        perf-profile.children.cycles-pp.prep_new_page
>       1.16 ±  7%      -0.3        0.90 ±  5%  perf-profile.children.cycles-pp.__mem_cgroup_charge
>       1.19            -0.3        0.93        perf-profile.children.cycles-pp.clear_page_erms
>       0.26 ±  8%      -0.1        0.16 ±  3%  perf-profile.children.cycles-pp.handle_internal_command
>       0.26 ±  8%      -0.1        0.16 ±  3%  perf-profile.children.cycles-pp.main
>       0.26 ±  8%      -0.1        0.16 ±  3%  perf-profile.children.cycles-pp.run_builtin
>       0.44 ± 10%      -0.1        0.35 ±  6%  perf-profile.children.cycles-pp.free_unref_folios
>       0.25 ±  9%      -0.1        0.16 ±  3%  perf-profile.children.cycles-pp.record__mmap_read_evlist
>       0.40 ± 11%      -0.1        0.31 ±  6%  perf-profile.children.cycles-pp.free_frozen_page_commit
>       0.24 ±  8%      -0.1        0.16 ±  4%  perf-profile.children.cycles-pp.perf_mmap__push
>       0.38 ± 13%      -0.1        0.30 ±  7%  perf-profile.children.cycles-pp.free_pcppages_bulk
>       0.55            -0.1        0.48        perf-profile.children.cycles-pp.sync_regs
>       0.48 ±  4%      -0.1        0.42 ±  2%  perf-profile.children.cycles-pp.native_irq_return_iret
>       0.37 ±  4%      -0.1        0.31 ±  3%  perf-profile.children.cycles-pp.rmqueue
>       0.35 ±  4%      -0.1        0.30 ±  3%  perf-profile.children.cycles-pp.rmqueue_pcplist
>       0.19 ±  6%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.record__pushfn
>       0.18 ±  7%      -0.0        0.13 ±  2%  perf-profile.children.cycles-pp.ksys_write
>       0.17 ±  5%      -0.0        0.13 ±  3%  perf-profile.children.cycles-pp.vfs_write
>       0.28 ±  5%      -0.0        0.24 ±  3%  perf-profile.children.cycles-pp.__rmqueue_pcplist
>       0.31            -0.0        0.27        perf-profile.children.cycles-pp.lru_add
>       0.16 ±  5%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.shmem_file_write_iter
>       0.24 ±  6%      -0.0        0.20 ±  5%  perf-profile.children.cycles-pp.rmqueue_bulk
>       0.16 ±  4%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.generic_perform_write
>       0.24 ±  2%      -0.0        0.20        perf-profile.children.cycles-pp.lru_gen_add_folio
>       0.21            -0.0        0.18        perf-profile.children.cycles-pp.lru_gen_del_folio
>       0.25 ±  2%      -0.0        0.22        perf-profile.children.cycles-pp.zap_present_ptes
>       0.14 ±  2%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.lock_vma_under_rcu
>       0.14 ±  3%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.__mod_node_page_state
>       0.13            -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.__perf_sw_event
>       0.06 ±  7%      -0.0        0.05        perf-profile.children.cycles-pp.___pte_offset_map
>       0.09 ±  5%      -0.0        0.08        perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
>       0.08 ±  6%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.vma_merge_extend
>       0.11 ±  3%      -0.0        0.10        perf-profile.children.cycles-pp.__free_one_page
>       0.07            -0.0        0.06        perf-profile.children.cycles-pp.error_entry
>       0.06            -0.0        0.05        perf-profile.children.cycles-pp.__mod_zone_page_state
>       0.11            -0.0        0.10        perf-profile.children.cycles-pp.___perf_sw_event
>       0.10 ±  4%      +0.0        0.11 ±  4%  perf-profile.children.cycles-pp.sched_tick
>       0.21 ±  3%      +0.0        0.24 ±  5%  perf-profile.children.cycles-pp.update_process_times
>       0.22 ±  3%      +0.0        0.26 ±  7%  perf-profile.children.cycles-pp.tick_nohz_handler
>       0.30 ±  4%      +0.0        0.34 ±  6%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>       0.29 ±  4%      +0.0        0.33 ±  6%  perf-profile.children.cycles-pp.hrtimer_interrupt
>       0.39 ±  2%      +0.0        0.43 ±  2%  perf-profile.children.cycles-pp.mremap
>       0.31 ±  4%      +0.0        0.36 ±  5%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>       0.34 ±  3%      +0.0        0.39 ±  5%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       0.28 ±  3%      +0.1        0.34 ±  2%  perf-profile.children.cycles-pp.__do_sys_mremap
>       0.28 ±  2%      +0.1        0.34 ±  3%  perf-profile.children.cycles-pp.do_mremap
>       0.11 ±  4%      +0.1        0.17 ±  2%  perf-profile.children.cycles-pp.expand_vma
>       0.00            +0.1        0.08        perf-profile.children.cycles-pp.__vm_enough_memory
>       0.00            +0.1        0.09 ±  5%  perf-profile.children.cycles-pp.vrm_calc_charge
>       0.04 ±141%      +0.1        0.13 ± 16%  perf-profile.children.cycles-pp.add_callchain_ip
>       0.04 ±142%      +0.1        0.14 ± 17%  perf-profile.children.cycles-pp.thread__resolve_callchain_sample
>       0.04 ±142%      +0.1        0.17 ± 15%  perf-profile.children.cycles-pp.__thread__resolve_callchain
>       0.04 ±142%      +0.1        0.18 ± 15%  perf-profile.children.cycles-pp.sample__for_each_callchain_node
>       0.05 ±141%      +0.1        0.18 ± 14%  perf-profile.children.cycles-pp.build_id__mark_dso_hit
>       0.05 ±141%      +0.1        0.19 ± 14%  perf-profile.children.cycles-pp.perf_session__deliver_event
>       0.05 ±141%      +0.1        0.20 ± 14%  perf-profile.children.cycles-pp.__ordered_events__flush
>       0.05 ±141%      +0.1        0.20 ± 33%  perf-profile.children.cycles-pp.perf_session__process_events
>       0.05 ±141%      +0.1        0.20 ± 33%  perf-profile.children.cycles-pp.record__finish_output
>      88.59            +1.5       90.13        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>      45.34 ±  2%      +7.2       52.54 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
>      44.15 ±  2%      +7.5       51.61 ±  2%  perf-profile.children.cycles-pp.do_sysinfo
>      44.33 ±  2%      +7.5       51.80 ±  2%  perf-profile.children.cycles-pp.sysinfo
>      44.28 ±  2%      +7.5       51.75 ±  2%  perf-profile.children.cycles-pp.__do_sys_sysinfo
>      40.25 ±  2%      +9.8       50.07 ±  2%  perf-profile.children.cycles-pp.si_swapinfo
>       0.55 ± 74%      -0.3        0.22 ±107%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
>       1.50 ±  4%      -0.3        1.17        perf-profile.self.cycles-pp._raw_spin_lock
>       1.18            -0.3        0.92        perf-profile.self.cycles-pp.clear_page_erms
>       2.01            -0.2        1.86 ±  3%  perf-profile.self.cycles-pp.stress_bigheap_child
>       0.55            -0.1        0.48        perf-profile.self.cycles-pp.sync_regs
>       0.48 ±  4%      -0.1        0.42 ±  2%  perf-profile.self.cycles-pp.native_irq_return_iret
>       0.14 ±  3%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.get_page_from_freelist
>       0.14 ±  8%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.do_anonymous_page
>       0.14 ±  2%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.rmqueue_bulk
>       0.14            -0.0        0.12        perf-profile.self.cycles-pp.lru_gen_del_folio
>       0.11 ±  3%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.__handle_mm_fault
>       0.15 ±  2%      -0.0        0.13        perf-profile.self.cycles-pp.lru_gen_add_folio
>       0.12 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.zap_present_ptes
>       0.12 ±  4%      -0.0        0.11        perf-profile.self.cycles-pp.__mod_node_page_state
>       0.07 ±  6%      -0.0        0.06        perf-profile.self.cycles-pp.lock_vma_under_rcu
>       0.10            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.__free_one_page
>       0.11 ±  3%      -0.0        0.10        perf-profile.self.cycles-pp.folios_put_refs
>       0.07            -0.0        0.06        perf-profile.self.cycles-pp.___perf_sw_event
>       0.07            -0.0        0.06        perf-profile.self.cycles-pp.do_user_addr_fault
>       0.07            -0.0        0.06        perf-profile.self.cycles-pp.lru_add
>       0.07            -0.0        0.06        perf-profile.self.cycles-pp.mas_walk
>       0.08            -0.0        0.07        perf-profile.self.cycles-pp.__alloc_frozen_pages_noprof
>       0.06            -0.0        0.05        perf-profile.self.cycles-pp.handle_mm_fault
>       0.06            -0.0        0.05        perf-profile.self.cycles-pp.page_counter_uncharge
>       0.00            +0.1        0.08        perf-profile.self.cycles-pp.__vm_enough_memory
>      88.36            +1.5       89.85        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
> 
Could you please test below patch and confirm if it solves regression:

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ecbac900c35f..118de1a8348c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3746,6 +3746,15 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 	return nr_allocated;
 }
 
+static void
+__vm_area_cleanup(struct vm_struct *area)
+{
+	if (area->pages)
+		vfree(area->addr);
+	else
+		free_vm_area(area);
+}
+
 static LLIST_HEAD(pending_vm_area_cleanup);
 static void cleanup_vm_area_work(struct work_struct *work)
 {
@@ -3756,12 +3765,8 @@ static void cleanup_vm_area_work(struct work_struct *work)
 	if (!head)
 		return;
 
-	llist_for_each_entry_safe(area, tmp, head, llnode) {
-		if (!area->pages)
-			free_vm_area(area);
-		else
-			vfree(area->addr);
-	}
+	llist_for_each_entry_safe(area, tmp, head, llnode)
+		__vm_area_cleanup(area);
 }
 
 /*
@@ -3769,8 +3774,11 @@ static void cleanup_vm_area_work(struct work_struct *work)
  * of partially initialized vm_struct in error paths.
  */
 static DECLARE_WORK(cleanup_vm_area, cleanup_vm_area_work);
-static void defer_vm_area_cleanup(struct vm_struct *area)
+static void vm_area_cleanup(struct vm_struct *area, bool can_block)
 {
+	if (can_block)
+		return __vm_area_cleanup(area);
+
 	if (llist_add(&area->llnode, &pending_vm_area_cleanup))
 		schedule_work(&cleanup_vm_area);
 }
@@ -3915,7 +3923,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	return area->addr;
 
 fail:
-	defer_vm_area_cleanup(area);
+	vm_area_cleanup(area, gfpflags_allow_blocking(gfp_mask));
 	return NULL;
 }
<snip>


--
Uladzislau Rezki


  reply	other threads:[~2025-12-15 12:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-12  3:27 [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression kernel test robot
2025-12-15 12:19 ` Uladzislau Rezki [this message]
2025-12-17  5:27   ` Oliver Sang
2025-12-17 11:04     ` Uladzislau Rezki
2025-12-17 11:52       ` Mateusz Guzik
2025-12-18  4:37       ` Oliver Sang
2025-12-18 17:37         ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aT_8woTbtklin3Bh@milan \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=ryabinin.a.a@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.