linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [mseal]  8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
@ 2024-08-04  8:59 kernel test robot
  2024-08-04 20:32 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: kernel test robot @ 2024-08-04  8:59 UTC (permalink / raw)
  To: Jeff Xu
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Kees Cook,
	Liam R. Howlett, Pedro Falcato, Dave Hansen, Greg Kroah-Hartman,
	Guenter Roeck, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jorge Lucangeli Obes, Linus Torvalds, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin, oliver.sang



Hello,

kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on:


commit: 8be7258aad44b5e25977a98db136f677fa6f4370 ("mseal: add mseal syscall")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: pagemove
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.pkey.ops_per_sec -3.6% regression                                      |
| test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters  | cpufreq_governor=performance                                                                |
|                  | nr_threads=100%                                                                             |
|                  | test=pkey                                                                                   |
|                  | testtime=60s                                                                                |
+------------------+---------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202408041602.caa0372-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240804/202408041602.caa0372-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pagemove/stress-ng/60s

commit: 
  ff388fe5c4 ("mseal: wire up mseal syscall")
  8be7258aad ("mseal: add mseal syscall")

ff388fe5c481d39c 8be7258aad44b5e25977a98db13 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  41625945            -4.3%   39842322        proc-vmstat.numa_hit
  41559175            -4.3%   39774160        proc-vmstat.numa_local
  77484314            -4.4%   74105555        proc-vmstat.pgalloc_normal
  77205752            -4.4%   73826672        proc-vmstat.pgfree
  18361466            -4.2%   17596652        stress-ng.pagemove.ops
    306014            -4.2%     293262        stress-ng.pagemove.ops_per_sec
    205312            -4.4%     196176        stress-ng.pagemove.page_remaps_per_sec
      4961            +1.0%       5013        stress-ng.time.percent_of_cpu_this_job_got
      2917            +1.2%       2952        stress-ng.time.system_time
      1.07            -6.6%       1.00        perf-stat.i.MPKI
 3.354e+10            +3.5%  3.473e+10        perf-stat.i.branch-instructions
 1.795e+08            -4.2%  1.719e+08        perf-stat.i.cache-misses
 2.376e+08            -4.1%  2.279e+08        perf-stat.i.cache-references
      1.13            -3.0%       1.10        perf-stat.i.cpi
      1077            +4.3%       1124        perf-stat.i.cycles-between-cache-misses
 1.717e+11            +2.7%  1.762e+11        perf-stat.i.instructions
      0.88            +3.1%       0.91        perf-stat.i.ipc
      1.05            -6.8%       0.97        perf-stat.overall.MPKI
      0.25 ±  2%      -0.0        0.24        perf-stat.overall.branch-miss-rate%
      1.13            -3.0%       1.10        perf-stat.overall.cpi
      1084            +4.0%       1127        perf-stat.overall.cycles-between-cache-misses
      0.88            +3.1%       0.91        perf-stat.overall.ipc
 3.298e+10            +3.5%  3.415e+10        perf-stat.ps.branch-instructions
 1.764e+08            -4.3%  1.689e+08        perf-stat.ps.cache-misses
 2.336e+08            -4.1%   2.24e+08        perf-stat.ps.cache-references
    194.57            -2.4%     189.96 ±  2%  perf-stat.ps.cpu-migrations
 1.688e+11            +2.7%  1.733e+11        perf-stat.ps.instructions
 1.036e+13            +3.0%  1.068e+13        perf-stat.total.instructions
     75.12            -1.9       73.22        perf-profile.calltrace.cycles-pp.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
     36.84            -1.6       35.29        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
     24.90            -1.2       23.72        perf-profile.calltrace.cycles-pp.copy_vma.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     19.89            -0.9       18.98        perf-profile.calltrace.cycles-pp.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
     10.56 ±  2%      -0.8        9.78 ±  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
     10.56 ±  2%      -0.8        9.79 ±  2%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
     10.56 ±  2%      -0.8        9.79 ±  2%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     10.57 ±  2%      -0.8        9.80 ±  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     10.52 ±  2%      -0.8        9.75 ±  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
     10.62 ±  2%      -0.8        9.85 ±  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
     10.62 ±  2%      -0.8        9.85 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
     10.62 ±  2%      -0.8        9.85 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
     14.75            -0.7       14.07        perf-profile.calltrace.cycles-pp.vma_merge.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      1.50            -0.6        0.94        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
      5.88 ±  2%      -0.4        5.47 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      7.80            -0.3        7.47        perf-profile.calltrace.cycles-pp.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.55 ±  2%      -0.3        4.24 ±  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
      6.76            -0.3        6.45        perf-profile.calltrace.cycles-pp.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      6.15            -0.3        5.86        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      8.22            -0.3        7.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      6.12            -0.3        5.87        perf-profile.calltrace.cycles-pp.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      5.74            -0.2        5.50        perf-profile.calltrace.cycles-pp.move_ptes.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
      3.16 ±  2%      -0.2        2.94        perf-profile.calltrace.cycles-pp.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      5.50            -0.2        5.28        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      1.36            -0.2        1.14        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
      5.15            -0.2        4.94        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap
      5.51            -0.2        5.31        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_merge.copy_vma.move_vma.__do_sys_mremap
      5.16            -0.2        4.97        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma.move_vma
      2.24            -0.2        2.05        perf-profile.calltrace.cycles-pp.find_vma_prev.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      2.60 ±  2%      -0.2        2.42 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs
      4.67            -0.2        4.49        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma
      3.41            -0.2        3.23        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
      3.00            -0.2        2.83 ±  2%  perf-profile.calltrace.cycles-pp.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      0.96            -0.2        0.80        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.do_munmap.mremap_to
      4.04            -0.2        3.88        perf-profile.calltrace.cycles-pp.vm_area_dup.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      3.20 ±  2%      -0.2        3.04 ±  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
      3.53            -0.1        3.38        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma.__do_sys_mremap
      3.40            -0.1        3.26        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma.__do_sys_mremap
      2.20 ±  2%      -0.1        2.06 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      1.84 ±  3%      -0.1        1.71 ±  3%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap
      1.78 ±  2%      -0.1        1.65 ±  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_merge.copy_vma.move_vma.__do_sys_mremap
      2.69            -0.1        2.56        perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.78 ±  2%      -0.1        1.66 ±  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
      1.36 ±  2%      -0.1        1.23 ±  2%  perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      0.95            -0.1        0.83        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap
      3.29            -0.1        3.17        perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      2.08            -0.1        1.96        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_merge.copy_vma.move_vma.__do_sys_mremap
      1.43 ±  3%      -0.1        1.32 ±  3%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_merge.copy_vma.move_vma
      2.21            -0.1        2.10        perf-profile.calltrace.cycles-pp.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      2.47            -0.1        2.36        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma
      2.21            -0.1        2.12        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables
      1.41            -0.1        1.32        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      1.26            -0.1        1.18        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap
      1.82            -0.1        1.75        perf-profile.calltrace.cycles-pp.vma_link.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      0.71            -0.1        0.63        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge.copy_vma.move_vma.__do_sys_mremap
      1.29            -0.1        1.22        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
      0.61            -0.1        0.54        perf-profile.calltrace.cycles-pp.mas_prev_setup.mas_prev.vma_merge.copy_vma.move_vma
      1.36            -0.1        1.29        perf-profile.calltrace.cycles-pp.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap
      1.40            -0.1        1.33        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma.move_vma
      0.70            -0.1        0.64        perf-profile.calltrace.cycles-pp.mas_walk.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
      1.23            -0.1        1.17        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma
      1.66            -0.1        1.60        perf-profile.calltrace.cycles-pp.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.16            -0.1        1.10        perf-profile.calltrace.cycles-pp.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      0.96            -0.1        0.90        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region
      1.14            -0.1        1.08        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
      0.79            -0.1        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
      1.04            -0.1        1.00        perf-profile.calltrace.cycles-pp.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.58            -0.0        0.53        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.mremap
      0.61            -0.0        0.56        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
      0.56            -0.0        0.52        perf-profile.calltrace.cycles-pp.mas_find.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
      0.57            -0.0        0.53 ±  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
      0.78            -0.0        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
      0.88            -0.0        0.84        perf-profile.calltrace.cycles-pp.mtree_load.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64
      0.70            -0.0        0.66        perf-profile.calltrace.cycles-pp.__call_rcu_common.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.mremap
      0.97            -0.0        0.93        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      1.11            -0.0        1.08        perf-profile.calltrace.cycles-pp.clear_bhb_loop.mremap
      0.75            -0.0        0.72        perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
      0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap
      0.60 ±  2%      -0.0        0.57        perf-profile.calltrace.cycles-pp.security_mmap_addr.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64
      0.67 ±  2%      -0.0        0.64        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma
      0.82            -0.0        0.79        perf-profile.calltrace.cycles-pp.mtree_load.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      0.63            -0.0        0.60        perf-profile.calltrace.cycles-pp.mas_prev_slot.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.99            -0.0        0.96        perf-profile.calltrace.cycles-pp.mt_find.vma_merge.copy_vma.move_vma.__do_sys_mremap
      0.62 ±  2%      -0.0        0.59        perf-profile.calltrace.cycles-pp.get_old_pud.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
      0.87            -0.0        0.84        perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      0.78            -0.0        0.75        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_link.copy_vma.move_vma.__do_sys_mremap
      0.64            -0.0        0.62        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_link.copy_vma.move_vma.__do_sys_mremap
      0.90            -0.0        0.87        perf-profile.calltrace.cycles-pp.mtree_load.vma_merge.copy_vma.move_vma.__do_sys_mremap
      0.54            -0.0        0.52        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.move_ptes.move_page_tables.move_vma.__do_sys_mremap
      1.04            +0.0        1.08        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
      0.76            +0.1        0.83        perf-profile.calltrace.cycles-pp.__madvise
      0.63            +0.1        0.70        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.62            +0.1        0.70        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
      0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     87.74            +0.7       88.45        perf-profile.calltrace.cycles-pp.mremap
      0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.do_munmap
      0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.mremap_to.__do_sys_mremap
     84.88            +0.9       85.77        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mremap
     84.73            +0.9       85.62        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      0.00            +0.9        0.92 ±  2%  perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.move_vma
     83.84            +0.9       84.78        perf-profile.calltrace.cycles-pp.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      0.00            +1.1        1.06        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64
      0.00            +1.2        1.21        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to
      2.07            +1.5        3.55        perf-profile.calltrace.cycles-pp.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.58            +1.5        3.07        perf-profile.calltrace.cycles-pp.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64
      0.00            +1.5        1.52        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap
      0.00            +1.6        1.57        perf-profile.calltrace.cycles-pp.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.7        1.72        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
      0.00            +2.0        2.01        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
      5.39            +2.9        8.32        perf-profile.calltrace.cycles-pp.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
     75.29            -1.9       73.37        perf-profile.children.cycles-pp.move_vma
     37.06            -1.6       35.50        perf-profile.children.cycles-pp.do_vmi_align_munmap
     24.98            -1.2       23.80        perf-profile.children.cycles-pp.copy_vma
     19.99            -1.0       19.02        perf-profile.children.cycles-pp.handle_softirqs
     19.97            -1.0       19.00        perf-profile.children.cycles-pp.rcu_core
     19.95            -1.0       18.98        perf-profile.children.cycles-pp.rcu_do_batch
     19.98            -0.9       19.06        perf-profile.children.cycles-pp.__split_vma
     17.55            -0.8       16.76        perf-profile.children.cycles-pp.kmem_cache_free
     10.56 ±  2%      -0.8        9.79 ±  2%  perf-profile.children.cycles-pp.run_ksoftirqd
     10.57 ±  2%      -0.8        9.80 ±  2%  perf-profile.children.cycles-pp.smpboot_thread_fn
     15.38            -0.8       14.62        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
     10.62 ±  2%      -0.8        9.85 ±  2%  perf-profile.children.cycles-pp.kthread
     10.62 ±  2%      -0.8        9.86 ±  2%  perf-profile.children.cycles-pp.ret_from_fork
     10.62 ±  2%      -0.8        9.86 ±  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
     15.14            -0.7       14.44        perf-profile.children.cycles-pp.vma_merge
     12.08            -0.5       11.55        perf-profile.children.cycles-pp.__slab_free
     12.11            -0.5       11.62        perf-profile.children.cycles-pp.mas_wr_store_entry
     10.86            -0.5       10.39        perf-profile.children.cycles-pp.vm_area_dup
     11.89            -0.5       11.44        perf-profile.children.cycles-pp.mas_store_prealloc
      8.49            -0.4        8.06        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      9.88            -0.4        9.49        perf-profile.children.cycles-pp.mas_wr_node_store
      7.91            -0.3        7.58        perf-profile.children.cycles-pp.move_page_tables
      6.06            -0.3        5.78        perf-profile.children.cycles-pp.vm_area_free_rcu_cb
      8.28            -0.3        8.00        perf-profile.children.cycles-pp.unmap_region
      6.69            -0.3        6.42        perf-profile.children.cycles-pp.vma_complete
      5.06            -0.3        4.80        perf-profile.children.cycles-pp.mas_preallocate
      5.82            -0.2        5.57        perf-profile.children.cycles-pp.move_ptes
      4.24            -0.2        4.01        perf-profile.children.cycles-pp.anon_vma_clone
      3.50            -0.2        3.30        perf-profile.children.cycles-pp.down_write
      2.44            -0.2        2.25        perf-profile.children.cycles-pp.find_vma_prev
      3.46            -0.2        3.28        perf-profile.children.cycles-pp.___slab_alloc
      3.45            -0.2        3.27        perf-profile.children.cycles-pp.free_pgtables
      2.54            -0.2        2.37        perf-profile.children.cycles-pp.rcu_cblist_dequeue
      3.35            -0.2        3.18        perf-profile.children.cycles-pp.__memcg_slab_free_hook
      2.93            -0.2        2.78        perf-profile.children.cycles-pp.mas_alloc_nodes
      2.28 ±  2%      -0.2        2.12 ±  2%  perf-profile.children.cycles-pp.vma_prepare
      3.46            -0.1        3.32        perf-profile.children.cycles-pp.flush_tlb_mm_range
      3.41            -0.1        3.27 ±  2%  perf-profile.children.cycles-pp.mod_objcg_state
      2.76            -0.1        2.63        perf-profile.children.cycles-pp.unlink_anon_vmas
      3.41            -0.1        3.28        perf-profile.children.cycles-pp.mas_store_gfp
      2.21            -0.1        2.09        perf-profile.children.cycles-pp.__cond_resched
      2.04            -0.1        1.94        perf-profile.children.cycles-pp.allocate_slab
      2.10            -0.1        2.00        perf-profile.children.cycles-pp.__call_rcu_common
      2.51            -0.1        2.40        perf-profile.children.cycles-pp.flush_tlb_func
      1.04            -0.1        0.94        perf-profile.children.cycles-pp.mas_prev
      2.71            -0.1        2.61        perf-profile.children.cycles-pp.mtree_load
      2.23            -0.1        2.14        perf-profile.children.cycles-pp.native_flush_tlb_one_user
      0.22 ±  5%      -0.1        0.13 ± 13%  perf-profile.children.cycles-pp.vm_stat_account
      0.95            -0.1        0.87        perf-profile.children.cycles-pp.mas_prev_setup
      1.65            -0.1        1.57        perf-profile.children.cycles-pp.mas_wr_walk
      1.84            -0.1        1.76        perf-profile.children.cycles-pp.up_write
      1.27            -0.1        1.20        perf-profile.children.cycles-pp.mas_prev_slot
      1.84            -0.1        1.77        perf-profile.children.cycles-pp.vma_link
      1.39            -0.1        1.32        perf-profile.children.cycles-pp.shuffle_freelist
      0.96            -0.1        0.90 ±  2%  perf-profile.children.cycles-pp.rcu_all_qs
      0.86            -0.1        0.80        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.70            -0.1        1.64        perf-profile.children.cycles-pp.__get_unmapped_area
      0.34 ±  3%      -0.1        0.29 ±  5%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.60            -0.0        0.55        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.92            -0.0        0.87        perf-profile.children.cycles-pp.percpu_counter_add_batch
      1.07            -0.0        1.02        perf-profile.children.cycles-pp.vma_to_resize
      1.59            -0.0        1.54        perf-profile.children.cycles-pp.mas_update_gap
      0.44 ±  2%      -0.0        0.40 ±  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      0.70            -0.0        0.66        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.13            -0.0        1.09        perf-profile.children.cycles-pp.mt_find
      0.20 ±  6%      -0.0        0.17 ±  9%  perf-profile.children.cycles-pp.cap_vm_enough_memory
      0.99            -0.0        0.95        perf-profile.children.cycles-pp.mas_pop_node
      0.63 ±  2%      -0.0        0.59        perf-profile.children.cycles-pp.security_mmap_addr
      0.62            -0.0        0.59        perf-profile.children.cycles-pp.__put_partials
      1.17            -0.0        1.14        perf-profile.children.cycles-pp.clear_bhb_loop
      0.46            -0.0        0.43 ±  2%  perf-profile.children.cycles-pp.__alloc_pages_noprof
      0.44            -0.0        0.41 ±  2%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.90            -0.0        0.87        perf-profile.children.cycles-pp.userfaultfd_unmap_complete
      0.64 ±  2%      -0.0        0.62        perf-profile.children.cycles-pp.get_old_pud
      1.07            -0.0        1.05        perf-profile.children.cycles-pp.mas_leaf_max_gap
      0.22 ±  3%      -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.55            -0.0        0.53        perf-profile.children.cycles-pp.refill_obj_stock
      0.25            -0.0        0.23 ±  3%  perf-profile.children.cycles-pp.rmqueue
      0.48            -0.0        0.45        perf-profile.children.cycles-pp.mremap_userfaultfd_prep
      0.33            -0.0        0.30        perf-profile.children.cycles-pp.free_unref_page
      0.46            -0.0        0.44        perf-profile.children.cycles-pp.setup_object
      0.21 ±  3%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.31 ±  3%      -0.0        0.29        perf-profile.children.cycles-pp.__vm_enough_memory
      0.40            -0.0        0.38        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.36            -0.0        0.35        perf-profile.children.cycles-pp.madvise_vma_behavior
      0.54            -0.0        0.53 ±  2%  perf-profile.children.cycles-pp.mas_wr_end_piv
      0.46            -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.rcu_segcblist_enqueue
      0.34            -0.0        0.32 ±  2%  perf-profile.children.cycles-pp.mas_destroy
      0.28            -0.0        0.26 ±  3%  perf-profile.children.cycles-pp.mas_wr_store_setup
      0.30            -0.0        0.28        perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.19            -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.__thp_vma_allowable_orders
      0.08 ±  4%      -0.0        0.07        perf-profile.children.cycles-pp.ksm_madvise
      0.17            -0.0        0.16        perf-profile.children.cycles-pp.get_any_partial
      0.08            -0.0        0.07        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.45            +0.0        0.47        perf-profile.children.cycles-pp._raw_spin_lock
      1.10            +0.0        1.14        perf-profile.children.cycles-pp.zap_pte_range
      0.78            +0.1        0.85        perf-profile.children.cycles-pp.__madvise
      0.63            +0.1        0.70        perf-profile.children.cycles-pp.__x64_sys_madvise
      0.62            +0.1        0.70        perf-profile.children.cycles-pp.do_madvise
      0.00            +0.1        0.09 ±  4%  perf-profile.children.cycles-pp.can_modify_mm_madv
      1.32            +0.1        1.46        perf-profile.children.cycles-pp.mas_next_slot
     88.13            +0.7       88.83        perf-profile.children.cycles-pp.mremap
     83.94            +0.9       84.88        perf-profile.children.cycles-pp.__do_sys_mremap
     86.06            +0.9       87.00        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     85.56            +1.0       86.54        perf-profile.children.cycles-pp.do_syscall_64
     40.49            +1.4       41.90        perf-profile.children.cycles-pp.do_vmi_munmap
      2.10            +1.5        3.57        perf-profile.children.cycles-pp.do_munmap
      3.62            +2.3        5.90        perf-profile.children.cycles-pp.mas_walk
      5.44            +2.9        8.38        perf-profile.children.cycles-pp.mremap_to
      5.30            +3.1        8.39        perf-profile.children.cycles-pp.mas_find
      0.00            +5.4        5.40        perf-profile.children.cycles-pp.can_modify_mm
     11.46            -0.5       10.96        perf-profile.self.cycles-pp.__slab_free
      4.30            -0.2        4.08        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
      2.51            -0.2        2.34        perf-profile.self.cycles-pp.rcu_cblist_dequeue
      2.41 ±  2%      -0.2        2.25        perf-profile.self.cycles-pp.down_write
      2.21            -0.1        2.11        perf-profile.self.cycles-pp.native_flush_tlb_one_user
      2.37            -0.1        2.28        perf-profile.self.cycles-pp.mtree_load
      1.60            -0.1        1.51        perf-profile.self.cycles-pp.__memcg_slab_free_hook
      0.18 ±  3%      -0.1        0.10 ± 15%  perf-profile.self.cycles-pp.vm_stat_account
      1.25            -0.1        1.18        perf-profile.self.cycles-pp.move_vma
      1.76            -0.1        1.69        perf-profile.self.cycles-pp.mod_objcg_state
      1.42            -0.1        1.35 ±  2%  perf-profile.self.cycles-pp.__call_rcu_common
      1.41            -0.1        1.34        perf-profile.self.cycles-pp.mas_wr_walk
      1.52            -0.1        1.46        perf-profile.self.cycles-pp.up_write
      1.02            -0.1        0.95        perf-profile.self.cycles-pp.mas_prev_slot
      0.96            -0.1        0.90 ±  2%  perf-profile.self.cycles-pp.vm_area_free_rcu_cb
      1.50            -0.1        1.45        perf-profile.self.cycles-pp.kmem_cache_free
      0.69 ±  3%      -0.1        0.64 ±  2%  perf-profile.self.cycles-pp.rcu_all_qs
      1.14 ±  2%      -0.1        1.09        perf-profile.self.cycles-pp.shuffle_freelist
      1.10            -0.1        1.05        perf-profile.self.cycles-pp.__cond_resched
      1.40            -0.0        1.35        perf-profile.self.cycles-pp.do_vmi_align_munmap
      0.99            -0.0        0.94        perf-profile.self.cycles-pp.mas_preallocate
      0.88            -0.0        0.83        perf-profile.self.cycles-pp.___slab_alloc
      0.55            -0.0        0.50        perf-profile.self.cycles-pp.mremap_to
      0.98            -0.0        0.93        perf-profile.self.cycles-pp.move_ptes
      0.78            -0.0        0.74        perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.21 ±  2%      -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.44 ±  2%      -0.0        0.40 ±  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.92            -0.0        0.89        perf-profile.self.cycles-pp.mas_store_gfp
      0.86            -0.0        0.82        perf-profile.self.cycles-pp.mas_pop_node
      0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.15            -0.0        1.12        perf-profile.self.cycles-pp.clear_bhb_loop
      1.14            -0.0        1.11        perf-profile.self.cycles-pp.vma_merge
      0.66            -0.0        0.63        perf-profile.self.cycles-pp.__split_vma
      0.16 ±  6%      -0.0        0.13 ±  7%  perf-profile.self.cycles-pp.cap_vm_enough_memory
      0.82            -0.0        0.79        perf-profile.self.cycles-pp.mas_wr_store_entry
      0.54 ±  2%      -0.0        0.52        perf-profile.self.cycles-pp.get_old_pud
      0.43            -0.0        0.40        perf-profile.self.cycles-pp.do_munmap
      0.51 ±  2%      -0.0        0.48 ±  2%  perf-profile.self.cycles-pp.security_mmap_addr
      0.50            -0.0        0.48        perf-profile.self.cycles-pp.refill_obj_stock
      0.24            -0.0        0.22        perf-profile.self.cycles-pp.mas_prev
      0.71            -0.0        0.69        perf-profile.self.cycles-pp.unmap_page_range
      0.48            -0.0        0.45        perf-profile.self.cycles-pp.find_vma_prev
      0.42            -0.0        0.40        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.66            -0.0        0.64        perf-profile.self.cycles-pp.mas_store_prealloc
      0.31            -0.0        0.29        perf-profile.self.cycles-pp.mas_prev_setup
      0.43            -0.0        0.41        perf-profile.self.cycles-pp.mas_wr_end_piv
      0.78            -0.0        0.76        perf-profile.self.cycles-pp.userfaultfd_unmap_complete
      0.28            -0.0        0.26 ±  2%  perf-profile.self.cycles-pp.mas_put_in_tree
      0.42            -0.0        0.40        perf-profile.self.cycles-pp.mremap_userfaultfd_prep
      0.28            -0.0        0.26        perf-profile.self.cycles-pp.free_pgtables
      0.39            -0.0        0.37        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.30 ±  2%      -0.0        0.28        perf-profile.self.cycles-pp.zap_pmd_range
      0.32            -0.0        0.31        perf-profile.self.cycles-pp.unmap_vmas
      0.21            -0.0        0.20        perf-profile.self.cycles-pp.__get_unmapped_area
      0.18 ±  2%      -0.0        0.17 ±  2%  perf-profile.self.cycles-pp.lru_add_drain_cpu
      0.06            -0.0        0.05        perf-profile.self.cycles-pp.ksm_madvise
      0.45            +0.0        0.46        perf-profile.self.cycles-pp.do_vmi_munmap
      0.37            +0.0        0.39        perf-profile.self.cycles-pp._raw_spin_lock
      1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
      1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
      0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
      3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk


***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pkey/stress-ng/60s

commit: 
  ff388fe5c4 ("mseal: wire up mseal syscall")
  8be7258aad ("mseal: add mseal syscall")

ff388fe5c481d39c 8be7258aad44b5e25977a98db13 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     10539            -2.5%      10273        vmstat.system.cs
      0.28 ±  5%     -20.1%       0.22 ±  7%  sched_debug.cfs_rq:/.h_nr_running.stddev
      1419 ±  7%     -15.3%       1202 ±  6%  sched_debug.cfs_rq:/.util_avg.max
      0.28 ±  6%     -18.4%       0.23 ±  8%  sched_debug.cpu.nr_running.stddev
 8.736e+08            -3.6%  8.423e+08        stress-ng.pkey.ops
  14560560            -3.6%   14038795        stress-ng.pkey.ops_per_sec
    770.39 ±  4%      -5.0%     732.04        stress-ng.time.user_time
    244657 ±  3%      +5.8%     258782 ±  3%  proc-vmstat.nr_slab_unreclaimable
  73133541            -2.1%   71588873        proc-vmstat.numa_hit
  72873579            -2.1%   71357274        proc-vmstat.numa_local
 1.842e+08            -2.5%  1.796e+08        proc-vmstat.pgalloc_normal
 1.767e+08            -2.8%  1.717e+08        proc-vmstat.pgfree
   1345346 ± 40%     -73.1%     362064 ±124%  numa-vmstat.node0.nr_inactive_anon
   1345340 ± 40%     -73.1%     362062 ±124%  numa-vmstat.node0.nr_zone_inactive_anon
   2420830 ± 14%     +35.1%    3270248 ± 16%  numa-vmstat.node1.nr_file_pages
   2067871 ± 13%     +51.5%    3132982 ± 17%  numa-vmstat.node1.nr_inactive_anon
    191406 ± 17%     +33.6%     255808 ± 14%  numa-vmstat.node1.nr_mapped
      2452 ± 61%    +104.4%       5012 ± 35%  numa-vmstat.node1.nr_page_table_pages
   2067853 ± 13%     +51.5%    3132966 ± 17%  numa-vmstat.node1.nr_zone_inactive_anon
   5379238 ± 40%     -73.0%    1453605 ±123%  numa-meminfo.node0.Inactive
   5379166 ± 40%     -73.0%    1453462 ±123%  numa-meminfo.node0.Inactive(anon)
   8741077 ± 22%     -36.7%    5531290 ± 28%  numa-meminfo.node0.MemUsed
   9651902 ± 13%     +35.8%   13105318 ± 16%  numa-meminfo.node1.FilePages
   8239855 ± 13%     +52.4%   12556929 ± 17%  numa-meminfo.node1.Inactive
   8239712 ± 13%     +52.4%   12556853 ± 17%  numa-meminfo.node1.Inactive(anon)
    761944 ± 18%     +34.6%    1025906 ± 14%  numa-meminfo.node1.Mapped
  11679628 ± 11%     +31.2%   15322841 ± 14%  numa-meminfo.node1.MemUsed
      9874 ± 62%    +104.6%      20200 ± 36%  numa-meminfo.node1.PageTables
      0.74            -4.2%       0.71        perf-stat.i.MPKI
 1.245e+11            +2.3%  1.274e+11        perf-stat.i.branch-instructions
      0.37            -0.0        0.35        perf-stat.i.branch-miss-rate%
 4.359e+08            -2.1%  4.265e+08        perf-stat.i.branch-misses
 4.672e+08            -2.6%  4.548e+08        perf-stat.i.cache-misses
 7.276e+08            -2.7%  7.082e+08        perf-stat.i.cache-references
      1.00            -1.6%       0.98        perf-stat.i.cpi
      1364            +2.9%       1404        perf-stat.i.cycles-between-cache-misses
 6.392e+11            +1.7%  6.499e+11        perf-stat.i.instructions
      1.00            +1.6%       1.02        perf-stat.i.ipc
      0.74            -4.3%       0.71        perf-stat.overall.MPKI
      0.35            -0.0        0.33        perf-stat.overall.branch-miss-rate%
      1.00            -1.6%       0.99        perf-stat.overall.cpi
      1356            +2.9%       1395        perf-stat.overall.cycles-between-cache-misses
      1.00            +1.6%       1.01        perf-stat.overall.ipc
 1.209e+11            +1.9%  1.232e+11        perf-stat.ps.branch-instructions
 4.188e+08            -2.6%  4.077e+08        perf-stat.ps.branch-misses
 4.585e+08            -3.1%  4.441e+08        perf-stat.ps.cache-misses
 7.124e+08            -3.1%  6.901e+08        perf-stat.ps.cache-references
     10321            -2.6%      10053        perf-stat.ps.context-switches





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-04  8:59 [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression kernel test robot
@ 2024-08-04 20:32 ` Linus Torvalds
  2024-08-05 13:33   ` Pedro Falcato
  2024-08-05 17:54   ` Jeff Xu
  2024-08-05 13:56 ` Jeff Xu
  2024-08-05 16:58 ` Jeff Xu
  2 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2024-08-04 20:32 UTC (permalink / raw)
  To: kernel test robot
  Cc: Jeff Xu, oe-lkp, lkp, linux-kernel, Andrew Morton, Kees Cook,
	Liam R. Howlett, Pedro Falcato, Dave Hansen, Greg Kroah-Hartman,
	Guenter Roeck, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

On Sun, 4 Aug 2024 at 01:59, kernel test robot <oliver.sang@intel.com> wrote:
>
> kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on
> commit 8be7258aad44 ("mseal: add mseal syscall")

Ok, it's basically just the vma walk in can_modify_mm():

>       1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
>       1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
>       0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
>       3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk

and looks like it's two different pathways. We have __do_sys_mremap ->
mremap_to -> do_munmap -> do_vmi_munmap -> can_modify_mm for the
destination mapping, but we also have mremap_to() calling
can_modify_mm() directly for the source mapping.

And then do_vmi_munmap() will do it's *own* vma_find() after having
done arch_unmap().

And do_munmap() will obviously do its own vma lookup as part of
calling vma_to_resize().

So it looks like a large portion of this regression is because the
mseal addition just ends up walking the vma list way too much.

              Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-04 20:32 ` Linus Torvalds
@ 2024-08-05 13:33   ` Pedro Falcato
  2024-08-05 18:10     ` Jeff Xu
  2024-08-05 17:54   ` Jeff Xu
  1 sibling, 1 reply; 29+ messages in thread
From: Pedro Falcato @ 2024-08-05 13:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kernel test robot, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Sun, Aug 4, 2024 at 9:33 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, 4 Aug 2024 at 01:59, kernel test robot <oliver.sang@intel.com> wrote:
> >
> > kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on
> > commit 8be7258aad44 ("mseal: add mseal syscall")
>
> Ok, it's basically just the vma walk in can_modify_mm():
>
> >       1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
> >       1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
> >       0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
> >       3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk
>
> and looks like it's two different pathways. We have __do_sys_mremap ->
> mremap_to -> do_munmap -> do_vmi_munmap -> can_modify_mm for the
> destination mapping, but we also have mremap_to() calling
> can_modify_mm() directly for the source mapping.
>
> And then do_vmi_munmap() will do it's *own* vma_find() after having
> done arch_unmap().
>
> And do_munmap() will obviously do its own vma lookup as part of
> calling vma_to_resize().
>
> So it looks like a large portion of this regression is because the
> mseal addition just ends up walking the vma list way too much.

Can we rollback the upfront checks "funny business" and just call
can_modify_vma directly in relevant places? I still don't believe in
the partial mprotect/munmap "security risks" that were stated in the
mseal thread (and these operations can already fail for many other
reasons than mseal) :)

I don't mind taking a look myself, just want to make sure I'm not
stepping on anyone's toes here.

-- 
Pedro

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-04  8:59 [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression kernel test robot
  2024-08-04 20:32 ` Linus Torvalds
@ 2024-08-05 13:56 ` Jeff Xu
  2024-08-05 16:58 ` Jeff Xu
  2 siblings, 0 replies; 29+ messages in thread
From: Jeff Xu @ 2024-08-05 13:56 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Kees Cook,
	Liam R. Howlett, Pedro Falcato, Dave Hansen, Greg Kroah-Hartman,
	Guenter Roeck, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jorge Lucangeli Obes, Linus Torvalds, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Sun, Aug 4, 2024 at 1:59 AM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on:
>
Looking.
I'm setting up the environment so I can repro. .

>
> commit: 8be7258aad44b5e25977a98db136f677fa6f4370 ("mseal: add mseal syscall")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: stress-ng
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
>         nr_threads: 100%
>         testtime: 60s
>         test: pagemove
>         cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.pkey.ops_per_sec -3.6% regression                                      |
> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
> | test parameters  | cpufreq_governor=performance                                                                |
> |                  | nr_threads=100%                                                                             |
> |                  | test=pkey                                                                                   |
> |                  | testtime=60s                                                                                |
> +------------------+---------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202408041602.caa0372-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240804/202408041602.caa0372-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pagemove/stress-ng/60s
>
> commit:
>   ff388fe5c4 ("mseal: wire up mseal syscall")
>   8be7258aad ("mseal: add mseal syscall")
>
> ff388fe5c481d39c 8be7258aad44b5e25977a98db13
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>   41625945            -4.3%   39842322        proc-vmstat.numa_hit
>   41559175            -4.3%   39774160        proc-vmstat.numa_local
>   77484314            -4.4%   74105555        proc-vmstat.pgalloc_normal
>   77205752            -4.4%   73826672        proc-vmstat.pgfree
>   18361466            -4.2%   17596652        stress-ng.pagemove.ops
>     306014            -4.2%     293262        stress-ng.pagemove.ops_per_sec
>     205312            -4.4%     196176        stress-ng.pagemove.page_remaps_per_sec
>       4961            +1.0%       5013        stress-ng.time.percent_of_cpu_this_job_got
>       2917            +1.2%       2952        stress-ng.time.system_time
>       1.07            -6.6%       1.00        perf-stat.i.MPKI
>  3.354e+10            +3.5%  3.473e+10        perf-stat.i.branch-instructions
>  1.795e+08            -4.2%  1.719e+08        perf-stat.i.cache-misses
>  2.376e+08            -4.1%  2.279e+08        perf-stat.i.cache-references
>       1.13            -3.0%       1.10        perf-stat.i.cpi
>       1077            +4.3%       1124        perf-stat.i.cycles-between-cache-misses
>  1.717e+11            +2.7%  1.762e+11        perf-stat.i.instructions
>       0.88            +3.1%       0.91        perf-stat.i.ipc
>       1.05            -6.8%       0.97        perf-stat.overall.MPKI
>       0.25 ą  2%      -0.0        0.24        perf-stat.overall.branch-miss-rate%
>       1.13            -3.0%       1.10        perf-stat.overall.cpi
>       1084            +4.0%       1127        perf-stat.overall.cycles-between-cache-misses
>       0.88            +3.1%       0.91        perf-stat.overall.ipc
>  3.298e+10            +3.5%  3.415e+10        perf-stat.ps.branch-instructions
>  1.764e+08            -4.3%  1.689e+08        perf-stat.ps.cache-misses
>  2.336e+08            -4.1%   2.24e+08        perf-stat.ps.cache-references
>     194.57            -2.4%     189.96 ą  2%  perf-stat.ps.cpu-migrations
>  1.688e+11            +2.7%  1.733e+11        perf-stat.ps.instructions
>  1.036e+13            +3.0%  1.068e+13        perf-stat.total.instructions
>      75.12            -1.9       73.22        perf-profile.calltrace.cycles-pp.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>      36.84            -1.6       35.29        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
>      24.90            -1.2       23.72        perf-profile.calltrace.cycles-pp.copy_vma.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      19.89            -0.9       18.98        perf-profile.calltrace.cycles-pp.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>      10.56 ą  2%      -0.8        9.78 ą  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
>      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
>      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      10.57 ą  2%      -0.8        9.80 ą  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      10.52 ą  2%      -0.8        9.75 ą  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
>      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>      14.75            -0.7       14.07        perf-profile.calltrace.cycles-pp.vma_merge.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       1.50            -0.6        0.94        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
>       5.88 ą  2%      -0.4        5.47 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       7.80            -0.3        7.47        perf-profile.calltrace.cycles-pp.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       4.55 ą  2%      -0.3        4.24 ą  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
>       6.76            -0.3        6.45        perf-profile.calltrace.cycles-pp.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       6.15            -0.3        5.86        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap
>       8.22            -0.3        7.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       6.12            -0.3        5.87        perf-profile.calltrace.cycles-pp.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       5.74            -0.2        5.50        perf-profile.calltrace.cycles-pp.move_ptes.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
>       3.16 ą  2%      -0.2        2.94        perf-profile.calltrace.cycles-pp.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       5.50            -0.2        5.28        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap
>       1.36            -0.2        1.14        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
>       5.15            -0.2        4.94        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap
>       5.51            -0.2        5.31        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       5.16            -0.2        4.97        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma.move_vma
>       2.24            -0.2        2.05        perf-profile.calltrace.cycles-pp.find_vma_prev.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       2.60 ą  2%      -0.2        2.42 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs
>       4.67            -0.2        4.49        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma
>       3.41            -0.2        3.23        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       3.00            -0.2        2.83 ą  2%  perf-profile.calltrace.cycles-pp.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       0.96            -0.2        0.80        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.do_munmap.mremap_to
>       4.04            -0.2        3.88        perf-profile.calltrace.cycles-pp.vm_area_dup.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       3.20 ą  2%      -0.2        3.04 ą  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
>       3.53            -0.1        3.38        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma.__do_sys_mremap
>       3.40            -0.1        3.26        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma.__do_sys_mremap
>       2.20 ą  2%      -0.1        2.06 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap
>       1.84 ą  3%      -0.1        1.71 ą  3%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap
>       1.78 ą  2%      -0.1        1.65 ą  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       2.69            -0.1        2.56        perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       1.78 ą  2%      -0.1        1.66 ą  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
>       1.36 ą  2%      -0.1        1.23 ą  2%  perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       0.95            -0.1        0.83        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap
>       3.29            -0.1        3.17        perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       2.08            -0.1        1.96        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       1.43 ą  3%      -0.1        1.32 ą  3%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_merge.copy_vma.move_vma
>       2.21            -0.1        2.10        perf-profile.calltrace.cycles-pp.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       2.47            -0.1        2.36        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma
>       2.21            -0.1        2.12        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables
>       1.41            -0.1        1.32        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap
>       1.26            -0.1        1.18        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap
>       1.82            -0.1        1.75        perf-profile.calltrace.cycles-pp.vma_link.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       0.71            -0.1        0.63        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       1.29            -0.1        1.22        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       0.61            -0.1        0.54        perf-profile.calltrace.cycles-pp.mas_prev_setup.mas_prev.vma_merge.copy_vma.move_vma
>       1.36            -0.1        1.29        perf-profile.calltrace.cycles-pp.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap
>       1.40            -0.1        1.33        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma.move_vma
>       0.70            -0.1        0.64        perf-profile.calltrace.cycles-pp.mas_walk.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
>       1.23            -0.1        1.17        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma
>       1.66            -0.1        1.60        perf-profile.calltrace.cycles-pp.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.16            -0.1        1.10        perf-profile.calltrace.cycles-pp.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       0.96            -0.1        0.90        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region
>       1.14            -0.1        1.08        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
>       0.79            -0.1        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
>       1.04            -0.1        1.00        perf-profile.calltrace.cycles-pp.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.58            -0.0        0.53        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.mremap
>       0.61            -0.0        0.56        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
>       0.56            -0.0        0.52        perf-profile.calltrace.cycles-pp.mas_find.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
>       0.57            -0.0        0.53 ą  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
>       0.78            -0.0        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
>       0.88            -0.0        0.84        perf-profile.calltrace.cycles-pp.mtree_load.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64
>       0.70            -0.0        0.66        perf-profile.calltrace.cycles-pp.__call_rcu_common.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.mremap
>       0.97            -0.0        0.93        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       1.11            -0.0        1.08        perf-profile.calltrace.cycles-pp.clear_bhb_loop.mremap
>       0.75            -0.0        0.72        perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
>       0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap
>       0.60 ą  2%      -0.0        0.57        perf-profile.calltrace.cycles-pp.security_mmap_addr.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64
>       0.67 ą  2%      -0.0        0.64        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma
>       0.82            -0.0        0.79        perf-profile.calltrace.cycles-pp.mtree_load.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>       0.63            -0.0        0.60        perf-profile.calltrace.cycles-pp.mas_prev_slot.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       0.99            -0.0        0.96        perf-profile.calltrace.cycles-pp.mt_find.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       0.62 ą  2%      -0.0        0.59        perf-profile.calltrace.cycles-pp.get_old_pud.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
>       0.87            -0.0        0.84        perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>       0.78            -0.0        0.75        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_link.copy_vma.move_vma.__do_sys_mremap
>       0.64            -0.0        0.62        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_link.copy_vma.move_vma.__do_sys_mremap
>       0.90            -0.0        0.87        perf-profile.calltrace.cycles-pp.mtree_load.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       0.54            -0.0        0.52        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.move_ptes.move_page_tables.move_vma.__do_sys_mremap
>       1.04            +0.0        1.08        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
>       0.76            +0.1        0.83        perf-profile.calltrace.cycles-pp.__madvise
>       0.63            +0.1        0.70        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       0.62            +0.1        0.70        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
>       0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>      87.74            +0.7       88.45        perf-profile.calltrace.cycles-pp.mremap
>       0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.do_munmap
>       0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.mremap_to.__do_sys_mremap
>      84.88            +0.9       85.77        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mremap
>      84.73            +0.9       85.62        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>       0.00            +0.9        0.92 ą  2%  perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.move_vma
>      83.84            +0.9       84.78        perf-profile.calltrace.cycles-pp.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>       0.00            +1.1        1.06        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64
>       0.00            +1.2        1.21        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to
>       2.07            +1.5        3.55        perf-profile.calltrace.cycles-pp.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.58            +1.5        3.07        perf-profile.calltrace.cycles-pp.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64
>       0.00            +1.5        1.52        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap
>       0.00            +1.6        1.57        perf-profile.calltrace.cycles-pp.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +1.7        1.72        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
>       0.00            +2.0        2.01        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
>       5.39            +2.9        8.32        perf-profile.calltrace.cycles-pp.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>      75.29            -1.9       73.37        perf-profile.children.cycles-pp.move_vma
>      37.06            -1.6       35.50        perf-profile.children.cycles-pp.do_vmi_align_munmap
>      24.98            -1.2       23.80        perf-profile.children.cycles-pp.copy_vma
>      19.99            -1.0       19.02        perf-profile.children.cycles-pp.handle_softirqs
>      19.97            -1.0       19.00        perf-profile.children.cycles-pp.rcu_core
>      19.95            -1.0       18.98        perf-profile.children.cycles-pp.rcu_do_batch
>      19.98            -0.9       19.06        perf-profile.children.cycles-pp.__split_vma
>      17.55            -0.8       16.76        perf-profile.children.cycles-pp.kmem_cache_free
>      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.children.cycles-pp.run_ksoftirqd
>      10.57 ą  2%      -0.8        9.80 ą  2%  perf-profile.children.cycles-pp.smpboot_thread_fn
>      15.38            -0.8       14.62        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
>      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.children.cycles-pp.kthread
>      10.62 ą  2%      -0.8        9.86 ą  2%  perf-profile.children.cycles-pp.ret_from_fork
>      10.62 ą  2%      -0.8        9.86 ą  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
>      15.14            -0.7       14.44        perf-profile.children.cycles-pp.vma_merge
>      12.08            -0.5       11.55        perf-profile.children.cycles-pp.__slab_free
>      12.11            -0.5       11.62        perf-profile.children.cycles-pp.mas_wr_store_entry
>      10.86            -0.5       10.39        perf-profile.children.cycles-pp.vm_area_dup
>      11.89            -0.5       11.44        perf-profile.children.cycles-pp.mas_store_prealloc
>       8.49            -0.4        8.06        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
>       9.88            -0.4        9.49        perf-profile.children.cycles-pp.mas_wr_node_store
>       7.91            -0.3        7.58        perf-profile.children.cycles-pp.move_page_tables
>       6.06            -0.3        5.78        perf-profile.children.cycles-pp.vm_area_free_rcu_cb
>       8.28            -0.3        8.00        perf-profile.children.cycles-pp.unmap_region
>       6.69            -0.3        6.42        perf-profile.children.cycles-pp.vma_complete
>       5.06            -0.3        4.80        perf-profile.children.cycles-pp.mas_preallocate
>       5.82            -0.2        5.57        perf-profile.children.cycles-pp.move_ptes
>       4.24            -0.2        4.01        perf-profile.children.cycles-pp.anon_vma_clone
>       3.50            -0.2        3.30        perf-profile.children.cycles-pp.down_write
>       2.44            -0.2        2.25        perf-profile.children.cycles-pp.find_vma_prev
>       3.46            -0.2        3.28        perf-profile.children.cycles-pp.___slab_alloc
>       3.45            -0.2        3.27        perf-profile.children.cycles-pp.free_pgtables
>       2.54            -0.2        2.37        perf-profile.children.cycles-pp.rcu_cblist_dequeue
>       3.35            -0.2        3.18        perf-profile.children.cycles-pp.__memcg_slab_free_hook
>       2.93            -0.2        2.78        perf-profile.children.cycles-pp.mas_alloc_nodes
>       2.28 ą  2%      -0.2        2.12 ą  2%  perf-profile.children.cycles-pp.vma_prepare
>       3.46            -0.1        3.32        perf-profile.children.cycles-pp.flush_tlb_mm_range
>       3.41            -0.1        3.27 ą  2%  perf-profile.children.cycles-pp.mod_objcg_state
>       2.76            -0.1        2.63        perf-profile.children.cycles-pp.unlink_anon_vmas
>       3.41            -0.1        3.28        perf-profile.children.cycles-pp.mas_store_gfp
>       2.21            -0.1        2.09        perf-profile.children.cycles-pp.__cond_resched
>       2.04            -0.1        1.94        perf-profile.children.cycles-pp.allocate_slab
>       2.10            -0.1        2.00        perf-profile.children.cycles-pp.__call_rcu_common
>       2.51            -0.1        2.40        perf-profile.children.cycles-pp.flush_tlb_func
>       1.04            -0.1        0.94        perf-profile.children.cycles-pp.mas_prev
>       2.71            -0.1        2.61        perf-profile.children.cycles-pp.mtree_load
>       2.23            -0.1        2.14        perf-profile.children.cycles-pp.native_flush_tlb_one_user
>       0.22 ą  5%      -0.1        0.13 ą 13%  perf-profile.children.cycles-pp.vm_stat_account
>       0.95            -0.1        0.87        perf-profile.children.cycles-pp.mas_prev_setup
>       1.65            -0.1        1.57        perf-profile.children.cycles-pp.mas_wr_walk
>       1.84            -0.1        1.76        perf-profile.children.cycles-pp.up_write
>       1.27            -0.1        1.20        perf-profile.children.cycles-pp.mas_prev_slot
>       1.84            -0.1        1.77        perf-profile.children.cycles-pp.vma_link
>       1.39            -0.1        1.32        perf-profile.children.cycles-pp.shuffle_freelist
>       0.96            -0.1        0.90 ą  2%  perf-profile.children.cycles-pp.rcu_all_qs
>       0.86            -0.1        0.80        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       1.70            -0.1        1.64        perf-profile.children.cycles-pp.__get_unmapped_area
>       0.34 ą  3%      -0.1        0.29 ą  5%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
>       0.60            -0.0        0.55        perf-profile.children.cycles-pp.entry_SYSCALL_64
>       0.92            -0.0        0.87        perf-profile.children.cycles-pp.percpu_counter_add_batch
>       1.07            -0.0        1.02        perf-profile.children.cycles-pp.vma_to_resize
>       1.59            -0.0        1.54        perf-profile.children.cycles-pp.mas_update_gap
>       0.44 ą  2%      -0.0        0.40 ą  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>       0.70            -0.0        0.66        perf-profile.children.cycles-pp.syscall_return_via_sysret
>       1.13            -0.0        1.09        perf-profile.children.cycles-pp.mt_find
>       0.20 ą  6%      -0.0        0.17 ą  9%  perf-profile.children.cycles-pp.cap_vm_enough_memory
>       0.99            -0.0        0.95        perf-profile.children.cycles-pp.mas_pop_node
>       0.63 ą  2%      -0.0        0.59        perf-profile.children.cycles-pp.security_mmap_addr
>       0.62            -0.0        0.59        perf-profile.children.cycles-pp.__put_partials
>       1.17            -0.0        1.14        perf-profile.children.cycles-pp.clear_bhb_loop
>       0.46            -0.0        0.43 ą  2%  perf-profile.children.cycles-pp.__alloc_pages_noprof
>       0.44            -0.0        0.41 ą  2%  perf-profile.children.cycles-pp.get_page_from_freelist
>       0.90            -0.0        0.87        perf-profile.children.cycles-pp.userfaultfd_unmap_complete
>       0.64 ą  2%      -0.0        0.62        perf-profile.children.cycles-pp.get_old_pud
>       1.07            -0.0        1.05        perf-profile.children.cycles-pp.mas_leaf_max_gap
>       0.22 ą  3%      -0.0        0.20 ą  2%  perf-profile.children.cycles-pp.__rmqueue_pcplist
>       0.55            -0.0        0.53        perf-profile.children.cycles-pp.refill_obj_stock
>       0.25            -0.0        0.23 ą  3%  perf-profile.children.cycles-pp.rmqueue
>       0.48            -0.0        0.45        perf-profile.children.cycles-pp.mremap_userfaultfd_prep
>       0.33            -0.0        0.30        perf-profile.children.cycles-pp.free_unref_page
>       0.46            -0.0        0.44        perf-profile.children.cycles-pp.setup_object
>       0.21 ą  3%      -0.0        0.19 ą  2%  perf-profile.children.cycles-pp.rmqueue_bulk
>       0.31 ą  3%      -0.0        0.29        perf-profile.children.cycles-pp.__vm_enough_memory
>       0.40            -0.0        0.38        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.36            -0.0        0.35        perf-profile.children.cycles-pp.madvise_vma_behavior
>       0.54            -0.0        0.53 ą  2%  perf-profile.children.cycles-pp.mas_wr_end_piv
>       0.46            -0.0        0.44 ą  2%  perf-profile.children.cycles-pp.rcu_segcblist_enqueue
>       0.34            -0.0        0.32 ą  2%  perf-profile.children.cycles-pp.mas_destroy
>       0.28            -0.0        0.26 ą  3%  perf-profile.children.cycles-pp.mas_wr_store_setup
>       0.30            -0.0        0.28        perf-profile.children.cycles-pp.pte_offset_map_nolock
>       0.19            -0.0        0.18 ą  2%  perf-profile.children.cycles-pp.__thp_vma_allowable_orders
>       0.08 ą  4%      -0.0        0.07        perf-profile.children.cycles-pp.ksm_madvise
>       0.17            -0.0        0.16        perf-profile.children.cycles-pp.get_any_partial
>       0.08            -0.0        0.07        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.45            +0.0        0.47        perf-profile.children.cycles-pp._raw_spin_lock
>       1.10            +0.0        1.14        perf-profile.children.cycles-pp.zap_pte_range
>       0.78            +0.1        0.85        perf-profile.children.cycles-pp.__madvise
>       0.63            +0.1        0.70        perf-profile.children.cycles-pp.__x64_sys_madvise
>       0.62            +0.1        0.70        perf-profile.children.cycles-pp.do_madvise
>       0.00            +0.1        0.09 ą  4%  perf-profile.children.cycles-pp.can_modify_mm_madv
>       1.32            +0.1        1.46        perf-profile.children.cycles-pp.mas_next_slot
>      88.13            +0.7       88.83        perf-profile.children.cycles-pp.mremap
>      83.94            +0.9       84.88        perf-profile.children.cycles-pp.__do_sys_mremap
>      86.06            +0.9       87.00        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      85.56            +1.0       86.54        perf-profile.children.cycles-pp.do_syscall_64
>      40.49            +1.4       41.90        perf-profile.children.cycles-pp.do_vmi_munmap
>       2.10            +1.5        3.57        perf-profile.children.cycles-pp.do_munmap
>       3.62            +2.3        5.90        perf-profile.children.cycles-pp.mas_walk
>       5.44            +2.9        8.38        perf-profile.children.cycles-pp.mremap_to
>       5.30            +3.1        8.39        perf-profile.children.cycles-pp.mas_find
>       0.00            +5.4        5.40        perf-profile.children.cycles-pp.can_modify_mm
>      11.46            -0.5       10.96        perf-profile.self.cycles-pp.__slab_free
>       4.30            -0.2        4.08        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
>       2.51            -0.2        2.34        perf-profile.self.cycles-pp.rcu_cblist_dequeue
>       2.41 ą  2%      -0.2        2.25        perf-profile.self.cycles-pp.down_write
>       2.21            -0.1        2.11        perf-profile.self.cycles-pp.native_flush_tlb_one_user
>       2.37            -0.1        2.28        perf-profile.self.cycles-pp.mtree_load
>       1.60            -0.1        1.51        perf-profile.self.cycles-pp.__memcg_slab_free_hook
>       0.18 ą  3%      -0.1        0.10 ą 15%  perf-profile.self.cycles-pp.vm_stat_account
>       1.25            -0.1        1.18        perf-profile.self.cycles-pp.move_vma
>       1.76            -0.1        1.69        perf-profile.self.cycles-pp.mod_objcg_state
>       1.42            -0.1        1.35 ą  2%  perf-profile.self.cycles-pp.__call_rcu_common
>       1.41            -0.1        1.34        perf-profile.self.cycles-pp.mas_wr_walk
>       1.52            -0.1        1.46        perf-profile.self.cycles-pp.up_write
>       1.02            -0.1        0.95        perf-profile.self.cycles-pp.mas_prev_slot
>       0.96            -0.1        0.90 ą  2%  perf-profile.self.cycles-pp.vm_area_free_rcu_cb
>       1.50            -0.1        1.45        perf-profile.self.cycles-pp.kmem_cache_free
>       0.69 ą  3%      -0.1        0.64 ą  2%  perf-profile.self.cycles-pp.rcu_all_qs
>       1.14 ą  2%      -0.1        1.09        perf-profile.self.cycles-pp.shuffle_freelist
>       1.10            -0.1        1.05        perf-profile.self.cycles-pp.__cond_resched
>       1.40            -0.0        1.35        perf-profile.self.cycles-pp.do_vmi_align_munmap
>       0.99            -0.0        0.94        perf-profile.self.cycles-pp.mas_preallocate
>       0.88            -0.0        0.83        perf-profile.self.cycles-pp.___slab_alloc
>       0.55            -0.0        0.50        perf-profile.self.cycles-pp.mremap_to
>       0.98            -0.0        0.93        perf-profile.self.cycles-pp.move_ptes
>       0.78            -0.0        0.74        perf-profile.self.cycles-pp.percpu_counter_add_batch
>       0.21 ą  2%      -0.0        0.18 ą  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
>       0.44 ą  2%      -0.0        0.40 ą  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>       0.92            -0.0        0.89        perf-profile.self.cycles-pp.mas_store_gfp
>       0.86            -0.0        0.82        perf-profile.self.cycles-pp.mas_pop_node
>       0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       1.15            -0.0        1.12        perf-profile.self.cycles-pp.clear_bhb_loop
>       1.14            -0.0        1.11        perf-profile.self.cycles-pp.vma_merge
>       0.66            -0.0        0.63        perf-profile.self.cycles-pp.__split_vma
>       0.16 ą  6%      -0.0        0.13 ą  7%  perf-profile.self.cycles-pp.cap_vm_enough_memory
>       0.82            -0.0        0.79        perf-profile.self.cycles-pp.mas_wr_store_entry
>       0.54 ą  2%      -0.0        0.52        perf-profile.self.cycles-pp.get_old_pud
>       0.43            -0.0        0.40        perf-profile.self.cycles-pp.do_munmap
>       0.51 ą  2%      -0.0        0.48 ą  2%  perf-profile.self.cycles-pp.security_mmap_addr
>       0.50            -0.0        0.48        perf-profile.self.cycles-pp.refill_obj_stock
>       0.24            -0.0        0.22        perf-profile.self.cycles-pp.mas_prev
>       0.71            -0.0        0.69        perf-profile.self.cycles-pp.unmap_page_range
>       0.48            -0.0        0.45        perf-profile.self.cycles-pp.find_vma_prev
>       0.42            -0.0        0.40        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.66            -0.0        0.64        perf-profile.self.cycles-pp.mas_store_prealloc
>       0.31            -0.0        0.29        perf-profile.self.cycles-pp.mas_prev_setup
>       0.43            -0.0        0.41        perf-profile.self.cycles-pp.mas_wr_end_piv
>       0.78            -0.0        0.76        perf-profile.self.cycles-pp.userfaultfd_unmap_complete
>       0.28            -0.0        0.26 ą  2%  perf-profile.self.cycles-pp.mas_put_in_tree
>       0.42            -0.0        0.40        perf-profile.self.cycles-pp.mremap_userfaultfd_prep
>       0.28            -0.0        0.26        perf-profile.self.cycles-pp.free_pgtables
>       0.39            -0.0        0.37        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.30 ą  2%      -0.0        0.28        perf-profile.self.cycles-pp.zap_pmd_range
>       0.32            -0.0        0.31        perf-profile.self.cycles-pp.unmap_vmas
>       0.21            -0.0        0.20        perf-profile.self.cycles-pp.__get_unmapped_area
>       0.18 ą  2%      -0.0        0.17 ą  2%  perf-profile.self.cycles-pp.lru_add_drain_cpu
>       0.06            -0.0        0.05        perf-profile.self.cycles-pp.ksm_madvise
>       0.45            +0.0        0.46        perf-profile.self.cycles-pp.do_vmi_munmap
>       0.37            +0.0        0.39        perf-profile.self.cycles-pp._raw_spin_lock
>       1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
>       1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
>       0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
>       3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk
>
>
> ***************************************************************************************************
> lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pkey/stress-ng/60s
>
> commit:
>   ff388fe5c4 ("mseal: wire up mseal syscall")
>   8be7258aad ("mseal: add mseal syscall")
>
> ff388fe5c481d39c 8be7258aad44b5e25977a98db13
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>      10539            -2.5%      10273        vmstat.system.cs
>       0.28 ą  5%     -20.1%       0.22 ą  7%  sched_debug.cfs_rq:/.h_nr_running.stddev
>       1419 ą  7%     -15.3%       1202 ą  6%  sched_debug.cfs_rq:/.util_avg.max
>       0.28 ą  6%     -18.4%       0.23 ą  8%  sched_debug.cpu.nr_running.stddev
>  8.736e+08            -3.6%  8.423e+08        stress-ng.pkey.ops
>   14560560            -3.6%   14038795        stress-ng.pkey.ops_per_sec
>     770.39 ą  4%      -5.0%     732.04        stress-ng.time.user_time
>     244657 ą  3%      +5.8%     258782 ą  3%  proc-vmstat.nr_slab_unreclaimable
>   73133541            -2.1%   71588873        proc-vmstat.numa_hit
>   72873579            -2.1%   71357274        proc-vmstat.numa_local
>  1.842e+08            -2.5%  1.796e+08        proc-vmstat.pgalloc_normal
>  1.767e+08            -2.8%  1.717e+08        proc-vmstat.pgfree
>    1345346 ą 40%     -73.1%     362064 ą124%  numa-vmstat.node0.nr_inactive_anon
>    1345340 ą 40%     -73.1%     362062 ą124%  numa-vmstat.node0.nr_zone_inactive_anon
>    2420830 ą 14%     +35.1%    3270248 ą 16%  numa-vmstat.node1.nr_file_pages
>    2067871 ą 13%     +51.5%    3132982 ą 17%  numa-vmstat.node1.nr_inactive_anon
>     191406 ą 17%     +33.6%     255808 ą 14%  numa-vmstat.node1.nr_mapped
>       2452 ą 61%    +104.4%       5012 ą 35%  numa-vmstat.node1.nr_page_table_pages
>    2067853 ą 13%     +51.5%    3132966 ą 17%  numa-vmstat.node1.nr_zone_inactive_anon
>    5379238 ą 40%     -73.0%    1453605 ą123%  numa-meminfo.node0.Inactive
>    5379166 ą 40%     -73.0%    1453462 ą123%  numa-meminfo.node0.Inactive(anon)
>    8741077 ą 22%     -36.7%    5531290 ą 28%  numa-meminfo.node0.MemUsed
>    9651902 ą 13%     +35.8%   13105318 ą 16%  numa-meminfo.node1.FilePages
>    8239855 ą 13%     +52.4%   12556929 ą 17%  numa-meminfo.node1.Inactive
>    8239712 ą 13%     +52.4%   12556853 ą 17%  numa-meminfo.node1.Inactive(anon)
>     761944 ą 18%     +34.6%    1025906 ą 14%  numa-meminfo.node1.Mapped
>   11679628 ą 11%     +31.2%   15322841 ą 14%  numa-meminfo.node1.MemUsed
>       9874 ą 62%    +104.6%      20200 ą 36%  numa-meminfo.node1.PageTables
>       0.74            -4.2%       0.71        perf-stat.i.MPKI
>  1.245e+11            +2.3%  1.274e+11        perf-stat.i.branch-instructions
>       0.37            -0.0        0.35        perf-stat.i.branch-miss-rate%
>  4.359e+08            -2.1%  4.265e+08        perf-stat.i.branch-misses
>  4.672e+08            -2.6%  4.548e+08        perf-stat.i.cache-misses
>  7.276e+08            -2.7%  7.082e+08        perf-stat.i.cache-references
>       1.00            -1.6%       0.98        perf-stat.i.cpi
>       1364            +2.9%       1404        perf-stat.i.cycles-between-cache-misses
>  6.392e+11            +1.7%  6.499e+11        perf-stat.i.instructions
>       1.00            +1.6%       1.02        perf-stat.i.ipc
>       0.74            -4.3%       0.71        perf-stat.overall.MPKI
>       0.35            -0.0        0.33        perf-stat.overall.branch-miss-rate%
>       1.00            -1.6%       0.99        perf-stat.overall.cpi
>       1356            +2.9%       1395        perf-stat.overall.cycles-between-cache-misses
>       1.00            +1.6%       1.01        perf-stat.overall.ipc
>  1.209e+11            +1.9%  1.232e+11        perf-stat.ps.branch-instructions
>  4.188e+08            -2.6%  4.077e+08        perf-stat.ps.branch-misses
>  4.585e+08            -3.1%  4.441e+08        perf-stat.ps.cache-misses
>  7.124e+08            -3.1%  6.901e+08        perf-stat.ps.cache-references
>      10321            -2.6%      10053        perf-stat.ps.context-switches
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-04  8:59 [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression kernel test robot
  2024-08-04 20:32 ` Linus Torvalds
  2024-08-05 13:56 ` Jeff Xu
@ 2024-08-05 16:58 ` Jeff Xu
  2024-08-06  1:44   ` Oliver Sang
  2 siblings, 1 reply; 29+ messages in thread
From: Jeff Xu @ 2024-08-05 16:58 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Kees Cook,
	Liam R. Howlett, Pedro Falcato, Dave Hansen, Greg Kroah-Hartman,
	Guenter Roeck, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jorge Lucangeli Obes, Linus Torvalds, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Sun, Aug 4, 2024 at 1:59 AM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on:
>
>
> commit: 8be7258aad44b5e25977a98db136f677fa6f4370 ("mseal: add mseal syscall")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: stress-ng
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
>         nr_threads: 100%
>         testtime: 60s
>         test: pagemove
>         cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.pkey.ops_per_sec -3.6% regression                                      |
> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
> | test parameters  | cpufreq_governor=performance                                                                |
> |                  | nr_threads=100%                                                                             |
> |                  | test=pkey                                                                                   |
> |                  | testtime=60s                                                                                |
> +------------------+---------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202408041602.caa0372-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240804/202408041602.caa0372-oliver.sang@intel.com
>
There is an error when I try to reproduce the test:

bin/lkp install job.yaml

--------------------------------------------------------
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libdw1 : Depends: libelf1 (= 0.190-1+b1)
 libdw1t64 : Breaks: libdw1 (< 0.191-2)
E: Unable to correct problems, you have held broken packages.
Cannot install some packages of perf-c2c depends
-----------------------------------------------------------------------------------------

And where is stress-ng.pagemove.page_remaps_per_sec test implemented,
is that part of lkp-tests ?

Thanks
-Jeff

> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pagemove/stress-ng/60s
>
> commit:
>   ff388fe5c4 ("mseal: wire up mseal syscall")
>   8be7258aad ("mseal: add mseal syscall")
>
> ff388fe5c481d39c 8be7258aad44b5e25977a98db13
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>   41625945            -4.3%   39842322        proc-vmstat.numa_hit
>   41559175            -4.3%   39774160        proc-vmstat.numa_local
>   77484314            -4.4%   74105555        proc-vmstat.pgalloc_normal
>   77205752            -4.4%   73826672        proc-vmstat.pgfree
>   18361466            -4.2%   17596652        stress-ng.pagemove.ops
>     306014            -4.2%     293262        stress-ng.pagemove.ops_per_sec
>     205312            -4.4%     196176        stress-ng.pagemove.page_remaps_per_sec
>       4961            +1.0%       5013        stress-ng.time.percent_of_cpu_this_job_got
>       2917            +1.2%       2952        stress-ng.time.system_time
>       1.07            -6.6%       1.00        perf-stat.i.MPKI
>  3.354e+10            +3.5%  3.473e+10        perf-stat.i.branch-instructions
>  1.795e+08            -4.2%  1.719e+08        perf-stat.i.cache-misses
>  2.376e+08            -4.1%  2.279e+08        perf-stat.i.cache-references
>       1.13            -3.0%       1.10        perf-stat.i.cpi
>       1077            +4.3%       1124        perf-stat.i.cycles-between-cache-misses
>  1.717e+11            +2.7%  1.762e+11        perf-stat.i.instructions
>       0.88            +3.1%       0.91        perf-stat.i.ipc
>       1.05            -6.8%       0.97        perf-stat.overall.MPKI
>       0.25 ą  2%      -0.0        0.24        perf-stat.overall.branch-miss-rate%
>       1.13            -3.0%       1.10        perf-stat.overall.cpi
>       1084            +4.0%       1127        perf-stat.overall.cycles-between-cache-misses
>       0.88            +3.1%       0.91        perf-stat.overall.ipc
>  3.298e+10            +3.5%  3.415e+10        perf-stat.ps.branch-instructions
>  1.764e+08            -4.3%  1.689e+08        perf-stat.ps.cache-misses
>  2.336e+08            -4.1%   2.24e+08        perf-stat.ps.cache-references
>     194.57            -2.4%     189.96 ą  2%  perf-stat.ps.cpu-migrations
>  1.688e+11            +2.7%  1.733e+11        perf-stat.ps.instructions
>  1.036e+13            +3.0%  1.068e+13        perf-stat.total.instructions
>      75.12            -1.9       73.22        perf-profile.calltrace.cycles-pp.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>      36.84            -1.6       35.29        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
>      24.90            -1.2       23.72        perf-profile.calltrace.cycles-pp.copy_vma.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      19.89            -0.9       18.98        perf-profile.calltrace.cycles-pp.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>      10.56 ą  2%      -0.8        9.78 ą  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
>      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
>      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      10.57 ą  2%      -0.8        9.80 ą  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      10.52 ą  2%      -0.8        9.75 ą  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
>      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>      14.75            -0.7       14.07        perf-profile.calltrace.cycles-pp.vma_merge.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       1.50            -0.6        0.94        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
>       5.88 ą  2%      -0.4        5.47 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       7.80            -0.3        7.47        perf-profile.calltrace.cycles-pp.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       4.55 ą  2%      -0.3        4.24 ą  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
>       6.76            -0.3        6.45        perf-profile.calltrace.cycles-pp.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       6.15            -0.3        5.86        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap
>       8.22            -0.3        7.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       6.12            -0.3        5.87        perf-profile.calltrace.cycles-pp.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       5.74            -0.2        5.50        perf-profile.calltrace.cycles-pp.move_ptes.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
>       3.16 ą  2%      -0.2        2.94        perf-profile.calltrace.cycles-pp.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       5.50            -0.2        5.28        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap
>       1.36            -0.2        1.14        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
>       5.15            -0.2        4.94        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap
>       5.51            -0.2        5.31        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       5.16            -0.2        4.97        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma.move_vma
>       2.24            -0.2        2.05        perf-profile.calltrace.cycles-pp.find_vma_prev.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       2.60 ą  2%      -0.2        2.42 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs
>       4.67            -0.2        4.49        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma
>       3.41            -0.2        3.23        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       3.00            -0.2        2.83 ą  2%  perf-profile.calltrace.cycles-pp.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       0.96            -0.2        0.80        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.do_munmap.mremap_to
>       4.04            -0.2        3.88        perf-profile.calltrace.cycles-pp.vm_area_dup.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       3.20 ą  2%      -0.2        3.04 ą  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
>       3.53            -0.1        3.38        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma.__do_sys_mremap
>       3.40            -0.1        3.26        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma.__do_sys_mremap
>       2.20 ą  2%      -0.1        2.06 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap
>       1.84 ą  3%      -0.1        1.71 ą  3%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap
>       1.78 ą  2%      -0.1        1.65 ą  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       2.69            -0.1        2.56        perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       1.78 ą  2%      -0.1        1.66 ą  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
>       1.36 ą  2%      -0.1        1.23 ą  2%  perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       0.95            -0.1        0.83        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap
>       3.29            -0.1        3.17        perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       2.08            -0.1        1.96        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       1.43 ą  3%      -0.1        1.32 ą  3%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_merge.copy_vma.move_vma
>       2.21            -0.1        2.10        perf-profile.calltrace.cycles-pp.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       2.47            -0.1        2.36        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma
>       2.21            -0.1        2.12        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables
>       1.41            -0.1        1.32        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap
>       1.26            -0.1        1.18        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap
>       1.82            -0.1        1.75        perf-profile.calltrace.cycles-pp.vma_link.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       0.71            -0.1        0.63        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       1.29            -0.1        1.22        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
>       0.61            -0.1        0.54        perf-profile.calltrace.cycles-pp.mas_prev_setup.mas_prev.vma_merge.copy_vma.move_vma
>       1.36            -0.1        1.29        perf-profile.calltrace.cycles-pp.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap
>       1.40            -0.1        1.33        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma.move_vma
>       0.70            -0.1        0.64        perf-profile.calltrace.cycles-pp.mas_walk.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
>       1.23            -0.1        1.17        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma
>       1.66            -0.1        1.60        perf-profile.calltrace.cycles-pp.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.16            -0.1        1.10        perf-profile.calltrace.cycles-pp.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
>       0.96            -0.1        0.90        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region
>       1.14            -0.1        1.08        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
>       0.79            -0.1        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
>       1.04            -0.1        1.00        perf-profile.calltrace.cycles-pp.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.58            -0.0        0.53        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.mremap
>       0.61            -0.0        0.56        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
>       0.56            -0.0        0.52        perf-profile.calltrace.cycles-pp.mas_find.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
>       0.57            -0.0        0.53 ą  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
>       0.78            -0.0        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
>       0.88            -0.0        0.84        perf-profile.calltrace.cycles-pp.mtree_load.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64
>       0.70            -0.0        0.66        perf-profile.calltrace.cycles-pp.__call_rcu_common.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.mremap
>       0.97            -0.0        0.93        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       1.11            -0.0        1.08        perf-profile.calltrace.cycles-pp.clear_bhb_loop.mremap
>       0.75            -0.0        0.72        perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
>       0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap
>       0.60 ą  2%      -0.0        0.57        perf-profile.calltrace.cycles-pp.security_mmap_addr.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64
>       0.67 ą  2%      -0.0        0.64        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma
>       0.82            -0.0        0.79        perf-profile.calltrace.cycles-pp.mtree_load.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>       0.63            -0.0        0.60        perf-profile.calltrace.cycles-pp.mas_prev_slot.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
>       0.99            -0.0        0.96        perf-profile.calltrace.cycles-pp.mt_find.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       0.62 ą  2%      -0.0        0.59        perf-profile.calltrace.cycles-pp.get_old_pud.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
>       0.87            -0.0        0.84        perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>       0.78            -0.0        0.75        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_link.copy_vma.move_vma.__do_sys_mremap
>       0.64            -0.0        0.62        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_link.copy_vma.move_vma.__do_sys_mremap
>       0.90            -0.0        0.87        perf-profile.calltrace.cycles-pp.mtree_load.vma_merge.copy_vma.move_vma.__do_sys_mremap
>       0.54            -0.0        0.52        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.move_ptes.move_page_tables.move_vma.__do_sys_mremap
>       1.04            +0.0        1.08        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
>       0.76            +0.1        0.83        perf-profile.calltrace.cycles-pp.__madvise
>       0.63            +0.1        0.70        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       0.62            +0.1        0.70        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
>       0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>      87.74            +0.7       88.45        perf-profile.calltrace.cycles-pp.mremap
>       0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.do_munmap
>       0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.mremap_to.__do_sys_mremap
>      84.88            +0.9       85.77        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mremap
>      84.73            +0.9       85.62        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>       0.00            +0.9        0.92 ą  2%  perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.move_vma
>      83.84            +0.9       84.78        perf-profile.calltrace.cycles-pp.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>       0.00            +1.1        1.06        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64
>       0.00            +1.2        1.21        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to
>       2.07            +1.5        3.55        perf-profile.calltrace.cycles-pp.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.58            +1.5        3.07        perf-profile.calltrace.cycles-pp.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64
>       0.00            +1.5        1.52        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap
>       0.00            +1.6        1.57        perf-profile.calltrace.cycles-pp.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +1.7        1.72        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
>       0.00            +2.0        2.01        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
>       5.39            +2.9        8.32        perf-profile.calltrace.cycles-pp.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
>      75.29            -1.9       73.37        perf-profile.children.cycles-pp.move_vma
>      37.06            -1.6       35.50        perf-profile.children.cycles-pp.do_vmi_align_munmap
>      24.98            -1.2       23.80        perf-profile.children.cycles-pp.copy_vma
>      19.99            -1.0       19.02        perf-profile.children.cycles-pp.handle_softirqs
>      19.97            -1.0       19.00        perf-profile.children.cycles-pp.rcu_core
>      19.95            -1.0       18.98        perf-profile.children.cycles-pp.rcu_do_batch
>      19.98            -0.9       19.06        perf-profile.children.cycles-pp.__split_vma
>      17.55            -0.8       16.76        perf-profile.children.cycles-pp.kmem_cache_free
>      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.children.cycles-pp.run_ksoftirqd
>      10.57 ą  2%      -0.8        9.80 ą  2%  perf-profile.children.cycles-pp.smpboot_thread_fn
>      15.38            -0.8       14.62        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
>      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.children.cycles-pp.kthread
>      10.62 ą  2%      -0.8        9.86 ą  2%  perf-profile.children.cycles-pp.ret_from_fork
>      10.62 ą  2%      -0.8        9.86 ą  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
>      15.14            -0.7       14.44        perf-profile.children.cycles-pp.vma_merge
>      12.08            -0.5       11.55        perf-profile.children.cycles-pp.__slab_free
>      12.11            -0.5       11.62        perf-profile.children.cycles-pp.mas_wr_store_entry
>      10.86            -0.5       10.39        perf-profile.children.cycles-pp.vm_area_dup
>      11.89            -0.5       11.44        perf-profile.children.cycles-pp.mas_store_prealloc
>       8.49            -0.4        8.06        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
>       9.88            -0.4        9.49        perf-profile.children.cycles-pp.mas_wr_node_store
>       7.91            -0.3        7.58        perf-profile.children.cycles-pp.move_page_tables
>       6.06            -0.3        5.78        perf-profile.children.cycles-pp.vm_area_free_rcu_cb
>       8.28            -0.3        8.00        perf-profile.children.cycles-pp.unmap_region
>       6.69            -0.3        6.42        perf-profile.children.cycles-pp.vma_complete
>       5.06            -0.3        4.80        perf-profile.children.cycles-pp.mas_preallocate
>       5.82            -0.2        5.57        perf-profile.children.cycles-pp.move_ptes
>       4.24            -0.2        4.01        perf-profile.children.cycles-pp.anon_vma_clone
>       3.50            -0.2        3.30        perf-profile.children.cycles-pp.down_write
>       2.44            -0.2        2.25        perf-profile.children.cycles-pp.find_vma_prev
>       3.46            -0.2        3.28        perf-profile.children.cycles-pp.___slab_alloc
>       3.45            -0.2        3.27        perf-profile.children.cycles-pp.free_pgtables
>       2.54            -0.2        2.37        perf-profile.children.cycles-pp.rcu_cblist_dequeue
>       3.35            -0.2        3.18        perf-profile.children.cycles-pp.__memcg_slab_free_hook
>       2.93            -0.2        2.78        perf-profile.children.cycles-pp.mas_alloc_nodes
>       2.28 ą  2%      -0.2        2.12 ą  2%  perf-profile.children.cycles-pp.vma_prepare
>       3.46            -0.1        3.32        perf-profile.children.cycles-pp.flush_tlb_mm_range
>       3.41            -0.1        3.27 ą  2%  perf-profile.children.cycles-pp.mod_objcg_state
>       2.76            -0.1        2.63        perf-profile.children.cycles-pp.unlink_anon_vmas
>       3.41            -0.1        3.28        perf-profile.children.cycles-pp.mas_store_gfp
>       2.21            -0.1        2.09        perf-profile.children.cycles-pp.__cond_resched
>       2.04            -0.1        1.94        perf-profile.children.cycles-pp.allocate_slab
>       2.10            -0.1        2.00        perf-profile.children.cycles-pp.__call_rcu_common
>       2.51            -0.1        2.40        perf-profile.children.cycles-pp.flush_tlb_func
>       1.04            -0.1        0.94        perf-profile.children.cycles-pp.mas_prev
>       2.71            -0.1        2.61        perf-profile.children.cycles-pp.mtree_load
>       2.23            -0.1        2.14        perf-profile.children.cycles-pp.native_flush_tlb_one_user
>       0.22 ą  5%      -0.1        0.13 ą 13%  perf-profile.children.cycles-pp.vm_stat_account
>       0.95            -0.1        0.87        perf-profile.children.cycles-pp.mas_prev_setup
>       1.65            -0.1        1.57        perf-profile.children.cycles-pp.mas_wr_walk
>       1.84            -0.1        1.76        perf-profile.children.cycles-pp.up_write
>       1.27            -0.1        1.20        perf-profile.children.cycles-pp.mas_prev_slot
>       1.84            -0.1        1.77        perf-profile.children.cycles-pp.vma_link
>       1.39            -0.1        1.32        perf-profile.children.cycles-pp.shuffle_freelist
>       0.96            -0.1        0.90 ą  2%  perf-profile.children.cycles-pp.rcu_all_qs
>       0.86            -0.1        0.80        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       1.70            -0.1        1.64        perf-profile.children.cycles-pp.__get_unmapped_area
>       0.34 ą  3%      -0.1        0.29 ą  5%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
>       0.60            -0.0        0.55        perf-profile.children.cycles-pp.entry_SYSCALL_64
>       0.92            -0.0        0.87        perf-profile.children.cycles-pp.percpu_counter_add_batch
>       1.07            -0.0        1.02        perf-profile.children.cycles-pp.vma_to_resize
>       1.59            -0.0        1.54        perf-profile.children.cycles-pp.mas_update_gap
>       0.44 ą  2%      -0.0        0.40 ą  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>       0.70            -0.0        0.66        perf-profile.children.cycles-pp.syscall_return_via_sysret
>       1.13            -0.0        1.09        perf-profile.children.cycles-pp.mt_find
>       0.20 ą  6%      -0.0        0.17 ą  9%  perf-profile.children.cycles-pp.cap_vm_enough_memory
>       0.99            -0.0        0.95        perf-profile.children.cycles-pp.mas_pop_node
>       0.63 ą  2%      -0.0        0.59        perf-profile.children.cycles-pp.security_mmap_addr
>       0.62            -0.0        0.59        perf-profile.children.cycles-pp.__put_partials
>       1.17            -0.0        1.14        perf-profile.children.cycles-pp.clear_bhb_loop
>       0.46            -0.0        0.43 ą  2%  perf-profile.children.cycles-pp.__alloc_pages_noprof
>       0.44            -0.0        0.41 ą  2%  perf-profile.children.cycles-pp.get_page_from_freelist
>       0.90            -0.0        0.87        perf-profile.children.cycles-pp.userfaultfd_unmap_complete
>       0.64 ą  2%      -0.0        0.62        perf-profile.children.cycles-pp.get_old_pud
>       1.07            -0.0        1.05        perf-profile.children.cycles-pp.mas_leaf_max_gap
>       0.22 ą  3%      -0.0        0.20 ą  2%  perf-profile.children.cycles-pp.__rmqueue_pcplist
>       0.55            -0.0        0.53        perf-profile.children.cycles-pp.refill_obj_stock
>       0.25            -0.0        0.23 ą  3%  perf-profile.children.cycles-pp.rmqueue
>       0.48            -0.0        0.45        perf-profile.children.cycles-pp.mremap_userfaultfd_prep
>       0.33            -0.0        0.30        perf-profile.children.cycles-pp.free_unref_page
>       0.46            -0.0        0.44        perf-profile.children.cycles-pp.setup_object
>       0.21 ą  3%      -0.0        0.19 ą  2%  perf-profile.children.cycles-pp.rmqueue_bulk
>       0.31 ą  3%      -0.0        0.29        perf-profile.children.cycles-pp.__vm_enough_memory
>       0.40            -0.0        0.38        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.36            -0.0        0.35        perf-profile.children.cycles-pp.madvise_vma_behavior
>       0.54            -0.0        0.53 ą  2%  perf-profile.children.cycles-pp.mas_wr_end_piv
>       0.46            -0.0        0.44 ą  2%  perf-profile.children.cycles-pp.rcu_segcblist_enqueue
>       0.34            -0.0        0.32 ą  2%  perf-profile.children.cycles-pp.mas_destroy
>       0.28            -0.0        0.26 ą  3%  perf-profile.children.cycles-pp.mas_wr_store_setup
>       0.30            -0.0        0.28        perf-profile.children.cycles-pp.pte_offset_map_nolock
>       0.19            -0.0        0.18 ą  2%  perf-profile.children.cycles-pp.__thp_vma_allowable_orders
>       0.08 ą  4%      -0.0        0.07        perf-profile.children.cycles-pp.ksm_madvise
>       0.17            -0.0        0.16        perf-profile.children.cycles-pp.get_any_partial
>       0.08            -0.0        0.07        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.45            +0.0        0.47        perf-profile.children.cycles-pp._raw_spin_lock
>       1.10            +0.0        1.14        perf-profile.children.cycles-pp.zap_pte_range
>       0.78            +0.1        0.85        perf-profile.children.cycles-pp.__madvise
>       0.63            +0.1        0.70        perf-profile.children.cycles-pp.__x64_sys_madvise
>       0.62            +0.1        0.70        perf-profile.children.cycles-pp.do_madvise
>       0.00            +0.1        0.09 ą  4%  perf-profile.children.cycles-pp.can_modify_mm_madv
>       1.32            +0.1        1.46        perf-profile.children.cycles-pp.mas_next_slot
>      88.13            +0.7       88.83        perf-profile.children.cycles-pp.mremap
>      83.94            +0.9       84.88        perf-profile.children.cycles-pp.__do_sys_mremap
>      86.06            +0.9       87.00        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      85.56            +1.0       86.54        perf-profile.children.cycles-pp.do_syscall_64
>      40.49            +1.4       41.90        perf-profile.children.cycles-pp.do_vmi_munmap
>       2.10            +1.5        3.57        perf-profile.children.cycles-pp.do_munmap
>       3.62            +2.3        5.90        perf-profile.children.cycles-pp.mas_walk
>       5.44            +2.9        8.38        perf-profile.children.cycles-pp.mremap_to
>       5.30            +3.1        8.39        perf-profile.children.cycles-pp.mas_find
>       0.00            +5.4        5.40        perf-profile.children.cycles-pp.can_modify_mm
>      11.46            -0.5       10.96        perf-profile.self.cycles-pp.__slab_free
>       4.30            -0.2        4.08        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
>       2.51            -0.2        2.34        perf-profile.self.cycles-pp.rcu_cblist_dequeue
>       2.41 ą  2%      -0.2        2.25        perf-profile.self.cycles-pp.down_write
>       2.21            -0.1        2.11        perf-profile.self.cycles-pp.native_flush_tlb_one_user
>       2.37            -0.1        2.28        perf-profile.self.cycles-pp.mtree_load
>       1.60            -0.1        1.51        perf-profile.self.cycles-pp.__memcg_slab_free_hook
>       0.18 ą  3%      -0.1        0.10 ą 15%  perf-profile.self.cycles-pp.vm_stat_account
>       1.25            -0.1        1.18        perf-profile.self.cycles-pp.move_vma
>       1.76            -0.1        1.69        perf-profile.self.cycles-pp.mod_objcg_state
>       1.42            -0.1        1.35 ą  2%  perf-profile.self.cycles-pp.__call_rcu_common
>       1.41            -0.1        1.34        perf-profile.self.cycles-pp.mas_wr_walk
>       1.52            -0.1        1.46        perf-profile.self.cycles-pp.up_write
>       1.02            -0.1        0.95        perf-profile.self.cycles-pp.mas_prev_slot
>       0.96            -0.1        0.90 ą  2%  perf-profile.self.cycles-pp.vm_area_free_rcu_cb
>       1.50            -0.1        1.45        perf-profile.self.cycles-pp.kmem_cache_free
>       0.69 ą  3%      -0.1        0.64 ą  2%  perf-profile.self.cycles-pp.rcu_all_qs
>       1.14 ą  2%      -0.1        1.09        perf-profile.self.cycles-pp.shuffle_freelist
>       1.10            -0.1        1.05        perf-profile.self.cycles-pp.__cond_resched
>       1.40            -0.0        1.35        perf-profile.self.cycles-pp.do_vmi_align_munmap
>       0.99            -0.0        0.94        perf-profile.self.cycles-pp.mas_preallocate
>       0.88            -0.0        0.83        perf-profile.self.cycles-pp.___slab_alloc
>       0.55            -0.0        0.50        perf-profile.self.cycles-pp.mremap_to
>       0.98            -0.0        0.93        perf-profile.self.cycles-pp.move_ptes
>       0.78            -0.0        0.74        perf-profile.self.cycles-pp.percpu_counter_add_batch
>       0.21 ą  2%      -0.0        0.18 ą  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
>       0.44 ą  2%      -0.0        0.40 ą  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>       0.92            -0.0        0.89        perf-profile.self.cycles-pp.mas_store_gfp
>       0.86            -0.0        0.82        perf-profile.self.cycles-pp.mas_pop_node
>       0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       1.15            -0.0        1.12        perf-profile.self.cycles-pp.clear_bhb_loop
>       1.14            -0.0        1.11        perf-profile.self.cycles-pp.vma_merge
>       0.66            -0.0        0.63        perf-profile.self.cycles-pp.__split_vma
>       0.16 ą  6%      -0.0        0.13 ą  7%  perf-profile.self.cycles-pp.cap_vm_enough_memory
>       0.82            -0.0        0.79        perf-profile.self.cycles-pp.mas_wr_store_entry
>       0.54 ą  2%      -0.0        0.52        perf-profile.self.cycles-pp.get_old_pud
>       0.43            -0.0        0.40        perf-profile.self.cycles-pp.do_munmap
>       0.51 ą  2%      -0.0        0.48 ą  2%  perf-profile.self.cycles-pp.security_mmap_addr
>       0.50            -0.0        0.48        perf-profile.self.cycles-pp.refill_obj_stock
>       0.24            -0.0        0.22        perf-profile.self.cycles-pp.mas_prev
>       0.71            -0.0        0.69        perf-profile.self.cycles-pp.unmap_page_range
>       0.48            -0.0        0.45        perf-profile.self.cycles-pp.find_vma_prev
>       0.42            -0.0        0.40        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.66            -0.0        0.64        perf-profile.self.cycles-pp.mas_store_prealloc
>       0.31            -0.0        0.29        perf-profile.self.cycles-pp.mas_prev_setup
>       0.43            -0.0        0.41        perf-profile.self.cycles-pp.mas_wr_end_piv
>       0.78            -0.0        0.76        perf-profile.self.cycles-pp.userfaultfd_unmap_complete
>       0.28            -0.0        0.26 ą  2%  perf-profile.self.cycles-pp.mas_put_in_tree
>       0.42            -0.0        0.40        perf-profile.self.cycles-pp.mremap_userfaultfd_prep
>       0.28            -0.0        0.26        perf-profile.self.cycles-pp.free_pgtables
>       0.39            -0.0        0.37        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.30 ą  2%      -0.0        0.28        perf-profile.self.cycles-pp.zap_pmd_range
>       0.32            -0.0        0.31        perf-profile.self.cycles-pp.unmap_vmas
>       0.21            -0.0        0.20        perf-profile.self.cycles-pp.__get_unmapped_area
>       0.18 ą  2%      -0.0        0.17 ą  2%  perf-profile.self.cycles-pp.lru_add_drain_cpu
>       0.06            -0.0        0.05        perf-profile.self.cycles-pp.ksm_madvise
>       0.45            +0.0        0.46        perf-profile.self.cycles-pp.do_vmi_munmap
>       0.37            +0.0        0.39        perf-profile.self.cycles-pp._raw_spin_lock
>       1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
>       1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
>       0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
>       3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk
>
>
> ***************************************************************************************************
> lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pkey/stress-ng/60s
>
> commit:
>   ff388fe5c4 ("mseal: wire up mseal syscall")
>   8be7258aad ("mseal: add mseal syscall")
>
> ff388fe5c481d39c 8be7258aad44b5e25977a98db13
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>      10539            -2.5%      10273        vmstat.system.cs
>       0.28 ą  5%     -20.1%       0.22 ą  7%  sched_debug.cfs_rq:/.h_nr_running.stddev
>       1419 ą  7%     -15.3%       1202 ą  6%  sched_debug.cfs_rq:/.util_avg.max
>       0.28 ą  6%     -18.4%       0.23 ą  8%  sched_debug.cpu.nr_running.stddev
>  8.736e+08            -3.6%  8.423e+08        stress-ng.pkey.ops
>   14560560            -3.6%   14038795        stress-ng.pkey.ops_per_sec
>     770.39 ą  4%      -5.0%     732.04        stress-ng.time.user_time
>     244657 ą  3%      +5.8%     258782 ą  3%  proc-vmstat.nr_slab_unreclaimable
>   73133541            -2.1%   71588873        proc-vmstat.numa_hit
>   72873579            -2.1%   71357274        proc-vmstat.numa_local
>  1.842e+08            -2.5%  1.796e+08        proc-vmstat.pgalloc_normal
>  1.767e+08            -2.8%  1.717e+08        proc-vmstat.pgfree
>    1345346 ą 40%     -73.1%     362064 ą124%  numa-vmstat.node0.nr_inactive_anon
>    1345340 ą 40%     -73.1%     362062 ą124%  numa-vmstat.node0.nr_zone_inactive_anon
>    2420830 ą 14%     +35.1%    3270248 ą 16%  numa-vmstat.node1.nr_file_pages
>    2067871 ą 13%     +51.5%    3132982 ą 17%  numa-vmstat.node1.nr_inactive_anon
>     191406 ą 17%     +33.6%     255808 ą 14%  numa-vmstat.node1.nr_mapped
>       2452 ą 61%    +104.4%       5012 ą 35%  numa-vmstat.node1.nr_page_table_pages
>    2067853 ą 13%     +51.5%    3132966 ą 17%  numa-vmstat.node1.nr_zone_inactive_anon
>    5379238 ą 40%     -73.0%    1453605 ą123%  numa-meminfo.node0.Inactive
>    5379166 ą 40%     -73.0%    1453462 ą123%  numa-meminfo.node0.Inactive(anon)
>    8741077 ą 22%     -36.7%    5531290 ą 28%  numa-meminfo.node0.MemUsed
>    9651902 ą 13%     +35.8%   13105318 ą 16%  numa-meminfo.node1.FilePages
>    8239855 ą 13%     +52.4%   12556929 ą 17%  numa-meminfo.node1.Inactive
>    8239712 ą 13%     +52.4%   12556853 ą 17%  numa-meminfo.node1.Inactive(anon)
>     761944 ą 18%     +34.6%    1025906 ą 14%  numa-meminfo.node1.Mapped
>   11679628 ą 11%     +31.2%   15322841 ą 14%  numa-meminfo.node1.MemUsed
>       9874 ą 62%    +104.6%      20200 ą 36%  numa-meminfo.node1.PageTables
>       0.74            -4.2%       0.71        perf-stat.i.MPKI
>  1.245e+11            +2.3%  1.274e+11        perf-stat.i.branch-instructions
>       0.37            -0.0        0.35        perf-stat.i.branch-miss-rate%
>  4.359e+08            -2.1%  4.265e+08        perf-stat.i.branch-misses
>  4.672e+08            -2.6%  4.548e+08        perf-stat.i.cache-misses
>  7.276e+08            -2.7%  7.082e+08        perf-stat.i.cache-references
>       1.00            -1.6%       0.98        perf-stat.i.cpi
>       1364            +2.9%       1404        perf-stat.i.cycles-between-cache-misses
>  6.392e+11            +1.7%  6.499e+11        perf-stat.i.instructions
>       1.00            +1.6%       1.02        perf-stat.i.ipc
>       0.74            -4.3%       0.71        perf-stat.overall.MPKI
>       0.35            -0.0        0.33        perf-stat.overall.branch-miss-rate%
>       1.00            -1.6%       0.99        perf-stat.overall.cpi
>       1356            +2.9%       1395        perf-stat.overall.cycles-between-cache-misses
>       1.00            +1.6%       1.01        perf-stat.overall.ipc
>  1.209e+11            +1.9%  1.232e+11        perf-stat.ps.branch-instructions
>  4.188e+08            -2.6%  4.077e+08        perf-stat.ps.branch-misses
>  4.585e+08            -3.1%  4.441e+08        perf-stat.ps.cache-misses
>  7.124e+08            -3.1%  6.901e+08        perf-stat.ps.cache-references
>      10321            -2.6%      10053        perf-stat.ps.context-switches
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-04 20:32 ` Linus Torvalds
  2024-08-05 13:33   ` Pedro Falcato
@ 2024-08-05 17:54   ` Jeff Xu
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff Xu @ 2024-08-05 17:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Andrew Morton,
	Kees Cook, Liam R. Howlett, Pedro Falcato, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Sun, Aug 4, 2024 at 1:33 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, 4 Aug 2024 at 01:59, kernel test robot <oliver.sang@intel.com> wrote:
> >
> > kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on
> > commit 8be7258aad44 ("mseal: add mseal syscall")
>
> Ok, it's basically just the vma walk in can_modify_mm():
>
> >       1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
> >       1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
> >       0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
> >       3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk
>
> and looks like it's two different pathways. We have __do_sys_mremap ->
> mremap_to -> do_munmap -> do_vmi_munmap -> can_modify_mm for the
> destination mapping, but we also have mremap_to() calling
> can_modify_mm() directly for the source mapping.
>

There are two scenarios in mremap syscall.
1> mremap_to (relocate vma)
2> shrink/expand.
Those two scenarios are handled by different code path:

For case 1>
mremap_to (relocate vma)
-> can_modify_mm , check src for sealing.
-> if MREMAP_FIXED
->-> do_munmap (dst) // free dst
->->-> do_vmi_munmap (dst)
->->->-> can_modify_mm (dst) // check dst for sealing
-> if dst size is smaller (shrink case)
->-> do_munmap(dst, to remove extra size)
->->-> do_vmi_munmap
->->->-> can_modify_mm(dst) (potentially duplicate  with check for
MREMAP_FIXED,  practically, the memory should be unmapped, so the cost
 looking for a un-existed memory range in the maple tree )

For case 2>
Shrink/Expand.
-> can_modify_mm, check addr is sealed
-> if dst size is smaller (shrink case)
->-> do_vmi_munmap(remove_extra_size)
-> ->-> can_modify_mm(addr) (This is redundant because addr is already checked)

For case 2:, potentially we can improve it by passing a flag into
do_vmi_munmap() to indicate the sealing is already checked by the
caller. (however, this idea have to be tested to show actual gain)

The reported regression is in mremap, I wonder why mprotect/munmap
doesn't have similar impact, since they use the same pattern (one
extra out-of-place check for memory range)

During version 9, I tested munmap/mprotect/madvise for perf [1] .  The
test shows mseal adds   20-40 ns or 50-100 CPU cycle pre call, this is
much smaller (one tenth)  than change from 5.10 to 6.8. The test is
using multiple VMAs with various types[2].  The next step for me is
to run the stress-ng.pagemove.page_remaps_per_sec to understand why
mremap shows a big regression number.

[1] https://lore.kernel.org/all/20240214151130.616240-1-jeffxu@chromium.org/
[2] https://github.com/peaktocreek/mmperf

Best regards,
-Jeff


> And then do_vmi_munmap() will do it's *own* vma_find() after having
> done arch_unmap().
>
> And do_munmap() will obviously do its own vma lookup as part of
> calling vma_to_resize().
>
> So it looks like a large portion of this regression is because the
> mseal addition just ends up walking the vma list way too much.
>
>               Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 13:33   ` Pedro Falcato
@ 2024-08-05 18:10     ` Jeff Xu
  2024-08-05 18:55       ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Xu @ 2024-08-05 18:10 UTC (permalink / raw)
  To: Pedro Falcato
  Cc: Linus Torvalds, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Mon, Aug 5, 2024 at 6:33 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Sun, Aug 4, 2024 at 9:33 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Sun, 4 Aug 2024 at 01:59, kernel test robot <oliver.sang@intel.com> wrote:
> > >
> > > kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on
> > > commit 8be7258aad44 ("mseal: add mseal syscall")
> >
> > Ok, it's basically just the vma walk in can_modify_mm():
> >
> > >       1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
> > >       1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
> > >       0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
> > >       3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk
> >
> > and looks like it's two different pathways. We have __do_sys_mremap ->
> > mremap_to -> do_munmap -> do_vmi_munmap -> can_modify_mm for the
> > destination mapping, but we also have mremap_to() calling
> > can_modify_mm() directly for the source mapping.
> >
> > And then do_vmi_munmap() will do it's *own* vma_find() after having
> > done arch_unmap().
> >
> > And do_munmap() will obviously do its own vma lookup as part of
> > calling vma_to_resize().
> >
> > So it looks like a large portion of this regression is because the
> > mseal addition just ends up walking the vma list way too much.
>
> Can we rollback the upfront checks "funny business" and just call
> can_modify_vma directly in relevant places? I still don't believe in
> the partial mprotect/munmap "security risks" that were stated in the
> mseal thread (and these operations can already fail for many other
> reasons than mseal) :)
>
In-place check and extra loop, implemented properly, will both prevent
changing to the sealed memory.

However, extra loop will make attacker difficult to call munmap(0,
random large-size), because  if one of vma in the range is sealed, the
whole operation will be no-op.

> I don't mind taking a look myself, just want to make sure I'm not
> stepping on anyone's toes here.
>
One thing that you can't walk around is that can_modify_mm must be
called prior to arch_unmap, that means in-place check for the munmap
is not possible.
( There are recent patch / refactor by Liam R. Howlett in this area,
but I am not sure if this restriction is removed)

> --
> Pedro

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 18:10     ` Jeff Xu
@ 2024-08-05 18:55       ` Linus Torvalds
  2024-08-05 19:33         ` Linus Torvalds
  2024-08-05 19:37         ` Jeff Xu
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2024-08-05 18:55 UTC (permalink / raw)
  To: Jeff Xu, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Pedro Falcato, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Mon, 5 Aug 2024 at 11:11, Jeff Xu <jeffxu@google.com> wrote:
>
> One thing that you can't walk around is that can_modify_mm must be
> called prior to arch_unmap, that means in-place check for the munmap
> is not possible.

Actually, we should move 'arch_unmap()'.

There is only one user of it, and it's pretty pointless.

(Ok, there are two users - x86 also has an 'arch_unmap()', but it's empty).

The reason I say that the current user of arch_unmap() is pointless is
because this is what the powerpc user does:

  static inline void arch_unmap(struct mm_struct *mm,
                                unsigned long start, unsigned long end)
  {
        unsigned long vdso_base = (unsigned long)mm->context.vdso;

        if (start <= vdso_base && vdso_base < end)
                mm->context.vdso = NULL;
  }

and that would make sense if we didn't have an actual 'vma' that
matched the vdso. But we do.

I think this code may predate the whole "create a vma for the vdso"
code. Or maybe it was just always confused.

Anyway, what the code *should* do is that we should just have a
->close() function for special mappings, and call that in
special_mapping_close().

This is an ENTIRELY UNTESTED patch that gets rid of this horrendous wart.

Michael / Nick / Christophe? Note that I didn't even compile-test this
on x86-64, much less on powerpc.

So please consider this a "maybe something like this" patch, but that
'arch_unmap()' really is pretty nasty.

Oh, and there was a bug in the error path of the powerpc vdso setup
code anyway. The patch fixes that too, although considering the
entirely untested nature of it, the "fixes" is laughably optimistic.

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 6309 bytes --]

 arch/powerpc/include/asm/mmu_context.h |  9 ---------
 arch/powerpc/kernel/vdso.c             | 12 +++++++++++-
 arch/x86/include/asm/mmu_context.h     |  5 -----
 include/asm-generic/mm_hooks.h         | 11 +++--------
 include/linux/mm_types.h               |  2 ++
 mm/mmap.c                              | 15 ++++++---------
 6 files changed, 22 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 37bffa0f7918..a334a1368848 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -260,15 +260,6 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
 
 extern void arch_exit_mmap(struct mm_struct *mm);
 
-static inline void arch_unmap(struct mm_struct *mm,
-			      unsigned long start, unsigned long end)
-{
-	unsigned long vdso_base = (unsigned long)mm->context.vdso;
-
-	if (start <= vdso_base && vdso_base < end)
-		mm->context.vdso = NULL;
-}
-
 #ifdef CONFIG_PPC_MEM_KEYS
 bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
 			       bool execute, bool foreign);
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 7a2ff9010f17..4de8af43f920 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -81,12 +81,20 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
 	return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
 }
 
+static int vvar_close(const struct vm_special_mapping *sm,
+		      struct vm_area_struct *vma)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	mm->context.vdso = NULL;
+}
+
 static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 			     struct vm_area_struct *vma, struct vm_fault *vmf);
 
 static struct vm_special_mapping vvar_spec __ro_after_init = {
 	.name = "[vvar]",
 	.fault = vvar_fault,
+	.close = vvar_close,
 };
 
 static struct vm_special_mapping vdso32_spec __ro_after_init = {
@@ -207,8 +215,10 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
 	vma = _install_special_mapping(mm, vdso_base, vvar_size,
 				       VM_READ | VM_MAYREAD | VM_IO |
 				       VM_DONTDUMP | VM_PFNMAP, &vvar_spec);
-	if (IS_ERR(vma))
+	if (IS_ERR(vma)) {
+		mm->context.vdso = NULL;
 		return PTR_ERR(vma);
+	}
 
 	/*
 	 * our vma flags don't have VM_WRITE so by default, the process isn't
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 8dac45a2c7fc..80f2a3187aa6 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -232,11 +232,6 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
 }
 #endif
 
-static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
-			      unsigned long end)
-{
-}
-
 /*
  * We only want to enforce protection keys on the current process
  * because we effectively have no access to PKRU for other
diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h
index 4dbb177d1150..6eea3b3c1e65 100644
--- a/include/asm-generic/mm_hooks.h
+++ b/include/asm-generic/mm_hooks.h
@@ -1,8 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * Define generic no-op hooks for arch_dup_mmap, arch_exit_mmap
- * and arch_unmap to be included in asm-FOO/mmu_context.h for any
- * arch FOO which doesn't need to hook these.
+ * Define generic no-op hooks for arch_dup_mmap and arch_exit_mmap
+ * to be included in asm-FOO/mmu_context.h for any arch FOO which
+ * doesn't need to hook these.
  */
 #ifndef _ASM_GENERIC_MM_HOOKS_H
 #define _ASM_GENERIC_MM_HOOKS_H
@@ -17,11 +17,6 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
 {
 }
 
-static inline void arch_unmap(struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-}
-
 static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
 		bool write, bool execute, bool foreign)
 {
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 485424979254..ef32d87a3adc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1313,6 +1313,8 @@ struct vm_special_mapping {
 
 	int (*mremap)(const struct vm_special_mapping *sm,
 		     struct vm_area_struct *new_vma);
+	void (*close)(const struct vm_special_mapping *sm,
+		      struct vm_area_struct *vma);
 };
 
 enum tlb_flush_reason {
diff --git a/mm/mmap.c b/mm/mmap.c
index d0dfc85b209b..adaaf1ef197a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2789,7 +2789,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
  *
  * This function takes a @mas that is either pointing to the previous VMA or set
  * to MA_START and sets it up to remove the mapping(s).  The @len will be
- * aligned and any arch_unmap work will be preformed.
+ * aligned.
  *
  * Return: 0 on success and drops the lock if so directed, error and leaves the
  * lock held otherwise.
@@ -2809,16 +2809,12 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
 		return -EINVAL;
 
 	/*
-	 * Check if memory is sealed before arch_unmap.
-	 * Prevent unmapping a sealed VMA.
+	 * Check if memory is sealed, prevent unmapping a sealed VMA.
 	 * can_modify_mm assumes we have acquired the lock on MM.
 	 */
 	if (unlikely(!can_modify_mm(mm, start, end)))
 		return -EPERM;
 
-	 /* arch_unmap() might do unmaps itself.  */
-	arch_unmap(mm, start, end);
-
 	/* Find the first overlapping VMA */
 	vma = vma_find(vmi, end);
 	if (!vma) {
@@ -3232,14 +3228,12 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 
 	/*
-	 * Check if memory is sealed before arch_unmap.
-	 * Prevent unmapping a sealed VMA.
+	 * Check if memory is sealed, prevent unmapping a sealed VMA.
 	 * can_modify_mm assumes we have acquired the lock on MM.
 	 */
 	if (unlikely(!can_modify_mm(mm, start, end)))
 		return -EPERM;
 
-	arch_unmap(mm, start, end);
 	return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
 }
 
@@ -3624,6 +3618,9 @@ static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
  */
 static void special_mapping_close(struct vm_area_struct *vma)
 {
+	const struct vm_special_mapping *sm = vma->vm_private_data;
+	if (sm->close)
+		sm->close(sm, vma);
 }
 
 static const char *special_mapping_name(struct vm_area_struct *vma)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 18:55       ` Linus Torvalds
@ 2024-08-05 19:33         ` Linus Torvalds
  2024-08-06  2:14           ` Michael Ellerman
  2024-08-06  6:04           ` Oliver Sang
  2024-08-05 19:37         ` Jeff Xu
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2024-08-05 19:33 UTC (permalink / raw)
  To: Jeff Xu, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Pedro Falcato, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]

On Mon, 5 Aug 2024 at 11:55, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So please consider this a "maybe something like this" patch, but that
> 'arch_unmap()' really is pretty nasty

Actually, the whole powerpc vdso code confused me. It's not the vvar
thing that wants this close thing, it's the other ones that have the
remap thing.

.. and there were two of those error cases that needed to reset the
vdso pointer.

That all shows just how carefully I was reading this code.

New version - still untested, but now I've read through it one more
time - attached.

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 6923 bytes --]

 arch/powerpc/include/asm/mmu_context.h |  9 ---------
 arch/powerpc/kernel/vdso.c             | 17 +++++++++++++++--
 arch/x86/include/asm/mmu_context.h     |  5 -----
 include/asm-generic/mm_hooks.h         | 11 +++--------
 include/linux/mm_types.h               |  2 ++
 mm/mmap.c                              | 15 ++++++---------
 6 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 37bffa0f7918..a334a1368848 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -260,15 +260,6 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
 
 extern void arch_exit_mmap(struct mm_struct *mm);
 
-static inline void arch_unmap(struct mm_struct *mm,
-			      unsigned long start, unsigned long end)
-{
-	unsigned long vdso_base = (unsigned long)mm->context.vdso;
-
-	if (start <= vdso_base && vdso_base < end)
-		mm->context.vdso = NULL;
-}
-
 #ifdef CONFIG_PPC_MEM_KEYS
 bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
 			       bool execute, bool foreign);
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 7a2ff9010f17..6fa041a6690a 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -81,6 +81,13 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
 	return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
 }
 
+static int vvar_close(const struct vm_special_mapping *sm,
+		      struct vm_area_struct *vma)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	mm->context.vdso = NULL;
+}
+
 static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 			     struct vm_area_struct *vma, struct vm_fault *vmf);
 
@@ -92,11 +99,13 @@ static struct vm_special_mapping vvar_spec __ro_after_init = {
 static struct vm_special_mapping vdso32_spec __ro_after_init = {
 	.name = "[vdso]",
 	.mremap = vdso32_mremap,
+	.close = vvar_close,
 };
 
 static struct vm_special_mapping vdso64_spec __ro_after_init = {
 	.name = "[vdso]",
 	.mremap = vdso64_mremap,
+	.close = vvar_close,
 };
 
 #ifdef CONFIG_TIME_NS
@@ -207,8 +216,10 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
 	vma = _install_special_mapping(mm, vdso_base, vvar_size,
 				       VM_READ | VM_MAYREAD | VM_IO |
 				       VM_DONTDUMP | VM_PFNMAP, &vvar_spec);
-	if (IS_ERR(vma))
+	if (IS_ERR(vma)) {
+		mm->context.vdso = NULL;
 		return PTR_ERR(vma);
+	}
 
 	/*
 	 * our vma flags don't have VM_WRITE so by default, the process isn't
@@ -223,8 +234,10 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
 	vma = _install_special_mapping(mm, vdso_base + vvar_size, vdso_size,
 				       VM_READ | VM_EXEC | VM_MAYREAD |
 				       VM_MAYWRITE | VM_MAYEXEC, vdso_spec);
-	if (IS_ERR(vma))
+	if (IS_ERR(vma)) {
+		mm->context.vdso = NULL;
 		do_munmap(mm, vdso_base, vvar_size, NULL);
+	}
 
 	return PTR_ERR_OR_ZERO(vma);
 }
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 8dac45a2c7fc..80f2a3187aa6 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -232,11 +232,6 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
 }
 #endif
 
-static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
-			      unsigned long end)
-{
-}
-
 /*
  * We only want to enforce protection keys on the current process
  * because we effectively have no access to PKRU for other
diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h
index 4dbb177d1150..6eea3b3c1e65 100644
--- a/include/asm-generic/mm_hooks.h
+++ b/include/asm-generic/mm_hooks.h
@@ -1,8 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * Define generic no-op hooks for arch_dup_mmap, arch_exit_mmap
- * and arch_unmap to be included in asm-FOO/mmu_context.h for any
- * arch FOO which doesn't need to hook these.
+ * Define generic no-op hooks for arch_dup_mmap and arch_exit_mmap
+ * to be included in asm-FOO/mmu_context.h for any arch FOO which
+ * doesn't need to hook these.
  */
 #ifndef _ASM_GENERIC_MM_HOOKS_H
 #define _ASM_GENERIC_MM_HOOKS_H
@@ -17,11 +17,6 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
 {
 }
 
-static inline void arch_unmap(struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-}
-
 static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
 		bool write, bool execute, bool foreign)
 {
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 485424979254..ef32d87a3adc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1313,6 +1313,8 @@ struct vm_special_mapping {
 
 	int (*mremap)(const struct vm_special_mapping *sm,
 		     struct vm_area_struct *new_vma);
+	void (*close)(const struct vm_special_mapping *sm,
+		      struct vm_area_struct *vma);
 };
 
 enum tlb_flush_reason {
diff --git a/mm/mmap.c b/mm/mmap.c
index d0dfc85b209b..adaaf1ef197a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2789,7 +2789,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
  *
  * This function takes a @mas that is either pointing to the previous VMA or set
  * to MA_START and sets it up to remove the mapping(s).  The @len will be
- * aligned and any arch_unmap work will be preformed.
+ * aligned.
  *
  * Return: 0 on success and drops the lock if so directed, error and leaves the
  * lock held otherwise.
@@ -2809,16 +2809,12 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
 		return -EINVAL;
 
 	/*
-	 * Check if memory is sealed before arch_unmap.
-	 * Prevent unmapping a sealed VMA.
+	 * Check if memory is sealed, prevent unmapping a sealed VMA.
 	 * can_modify_mm assumes we have acquired the lock on MM.
 	 */
 	if (unlikely(!can_modify_mm(mm, start, end)))
 		return -EPERM;
 
-	 /* arch_unmap() might do unmaps itself.  */
-	arch_unmap(mm, start, end);
-
 	/* Find the first overlapping VMA */
 	vma = vma_find(vmi, end);
 	if (!vma) {
@@ -3232,14 +3228,12 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 
 	/*
-	 * Check if memory is sealed before arch_unmap.
-	 * Prevent unmapping a sealed VMA.
+	 * Check if memory is sealed, prevent unmapping a sealed VMA.
 	 * can_modify_mm assumes we have acquired the lock on MM.
 	 */
 	if (unlikely(!can_modify_mm(mm, start, end)))
 		return -EPERM;
 
-	arch_unmap(mm, start, end);
 	return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
 }
 
@@ -3624,6 +3618,9 @@ static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
  */
 static void special_mapping_close(struct vm_area_struct *vma)
 {
+	const struct vm_special_mapping *sm = vma->vm_private_data;
+	if (sm->close)
+		sm->close(sm, vma);
 }
 
 static const char *special_mapping_name(struct vm_area_struct *vma)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 18:55       ` Linus Torvalds
  2024-08-05 19:33         ` Linus Torvalds
@ 2024-08-05 19:37         ` Jeff Xu
  2024-08-05 19:48           ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Jeff Xu @ 2024-08-05 19:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Pedro Falcato, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Mon, Aug 5, 2024 at 12:01 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 5 Aug 2024 at 11:11, Jeff Xu <jeffxu@google.com> wrote:
> >
> > One thing that you can't walk around is that can_modify_mm must be
> > called prior to arch_unmap, that means in-place check for the munmap
> > is not possible.
>
> Actually, we should move 'arch_unmap()'.
>
I think you meant "remove"

> There is only one user of it, and it's pretty pointless.
>
> (Ok, there are two users - x86 also has an 'arch_unmap()', but it's empty).
>
> The reason I say that the current user of arch_unmap() is pointless is
> because this is what the powerpc user does:
>
>   static inline void arch_unmap(struct mm_struct *mm,
>                                 unsigned long start, unsigned long end)
>   {
>         unsigned long vdso_base = (unsigned long)mm->context.vdso;
>
>         if (start <= vdso_base && vdso_base < end)
>                 mm->context.vdso = NULL;
>   }
>
> and that would make sense if we didn't have an actual 'vma' that
> matched the vdso. But we do.
>
> I think this code may predate the whole "create a vma for the vdso"
> code. Or maybe it was just always confused.
>
Agree it is best to remove.

> Anyway, what the code *should* do is that we should just have a
> ->close() function for special mappings, and call that in
> special_mapping_close().
>
I'm curious, why does ppc need to unmap vdso ? ( other archs don't
have unmap logic.)

vdso has .remap, iiuc, that is for CHECKPOINT_RESTORE feature, i.e.
during restore, vdso might get relocated after taking from dump. [1]
IIUC, vdso mapping doesn't change during the lifetime of the process.
Or does it in some user cases ?

[1] https://lore.kernel.org/linux-mm/20161101172214.2938-1-dsafonov@virtuozzo.com/


> This is an ENTIRELY UNTESTED patch that gets rid of this horrendous wart.
>
> Michael / Nick / Christophe? Note that I didn't even compile-test this
> on x86-64, much less on powerpc.
>
> So please consider this a "maybe something like this" patch, but that
> 'arch_unmap()' really is pretty nasty.
>
> Oh, and there was a bug in the error path of the powerpc vdso setup
> code anyway. The patch fixes that too, although considering the
> entirely untested nature of it, the "fixes" is laughably optimistic.
>
>                  Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 19:37         ` Jeff Xu
@ 2024-08-05 19:48           ` Linus Torvalds
  2024-08-05 19:50             ` Linus Torvalds
  2024-08-05 23:24             ` Nicholas Piggin
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2024-08-05 19:48 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Pedro Falcato, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Mon, 5 Aug 2024 at 12:38, Jeff Xu <jeffxu@google.com> wrote:
>
> I'm curious, why does ppc need to unmap vdso ? ( other archs don't
> have unmap logic.)

I have no idea. There are comments about 'perf' getting confused about
mmap counts when 'context.vdso' isn't set up.

But x86 has the same context.vdso logic, and does *not* set the
pointer before installing the vma, for example. Also does not zero it
out on munmap(), although it does have the mremap logic.

For all I know it may all be entirely unnecessary, and could be
removed entirely.

          Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 19:48           ` Linus Torvalds
@ 2024-08-05 19:50             ` Linus Torvalds
  2024-08-05 23:24             ` Nicholas Piggin
  1 sibling, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2024-08-05 19:50 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Pedro Falcato, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Mon, 5 Aug 2024 at 12:48, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But x86 has the same context.vdso logic, and does *not* set the
> pointer before installing the vma, for example. Also does not zero it
> out on munmap(), although it does have the mremap logic.

Oh, and the empty stale arch_unmap() code on the x86 side has never
been about the vdso thing, it was about some horrid MPX notification
that no longer exists.

In case people wonder like I did.

           Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 19:48           ` Linus Torvalds
  2024-08-05 19:50             ` Linus Torvalds
@ 2024-08-05 23:24             ` Nicholas Piggin
  2024-08-06  0:13               ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2024-08-05 23:24 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Xu
  Cc: Michael Ellerman, Christophe Leroy, Pedro Falcato,
	kernel test robot, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

On Tue Aug 6, 2024 at 5:48 AM AEST, Linus Torvalds wrote:
> On Mon, 5 Aug 2024 at 12:38, Jeff Xu <jeffxu@google.com> wrote:
> >
> > I'm curious, why does ppc need to unmap vdso ? ( other archs don't
> > have unmap logic.)
>
> I have no idea. There are comments about 'perf' getting confused about
> mmap counts when 'context.vdso' isn't set up.
>
> But x86 has the same context.vdso logic, and does *not* set the
> pointer before installing the vma, for example. Also does not zero it
> out on munmap(), although it does have the mremap logic.
>
> For all I know it may all be entirely unnecessary, and could be
> removed entirely.

I don't know much about vdso code, it predated my involvedment in ppc.
Commit 83d3f0e90c6c8 says CRIU (checkpoint restore in userspace) is
moving it around. Why CRIU wants to do that, I don't know.

Can userspace on other archs not unmap their vdsos?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 23:24             ` Nicholas Piggin
@ 2024-08-06  0:13               ` Linus Torvalds
  2024-08-06  1:22                 ` Jeff Xu
  2024-08-06  2:01                 ` Michael Ellerman
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2024-08-06  0:13 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jeff Xu, Michael Ellerman, Christophe Leroy, Pedro Falcato,
	kernel test robot, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

On Mon, 5 Aug 2024 at 16:25, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Can userspace on other archs not unmap their vdsos?

I think they can, and nobody cares. The "context.vdso" value stays at
some stale value, and anybody who tries to use it will just fail.

So what makes powerpc special is not "you can unmap the vdso", but
"powerpc cares".

I just don't quite know _why_ powerpc cares.

Judging by the comments and a quick 'grep', the reason may be

    arch/powerpc/perf/callchain_32.c

which seems to have some vdso knowledge.

But x86 does something kind of like that at signal frame generation
time, and doesn't care.

I really think it's an issue of "if you screw with the vdso, you get
to keep both broken pieces".

           Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  0:13               ` Linus Torvalds
@ 2024-08-06  1:22                 ` Jeff Xu
  2024-08-06  2:01                 ` Michael Ellerman
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff Xu @ 2024-08-06  1:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nicholas Piggin, Michael Ellerman, Christophe Leroy,
	Pedro Falcato, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Mon, Aug 5, 2024 at 5:13 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 5 Aug 2024 at 16:25, Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > Can userspace on other archs not unmap their vdsos?
>
> I think they can, and nobody cares. The "context.vdso" value stays at
> some stale value, and anybody who tries to use it will just fail.
>
I want to seal the vdso :-), so I also care (not having it changeable
from userspace)

For the restore scenario, if vdso is sealed,  I guess CRIU won't be
able to relocate the vdso from userspace, I 'm interested in hearing
vdso dev's input on this , e.g. is that possible to make CRIU
compatible with memory sealing.

> So what makes powerpc special is not "you can unmap the vdso", but
> "powerpc cares".
>
> I just don't quite know _why_ powerpc cares.
>
> Judging by the comments and a quick 'grep', the reason may be
>
>     arch/powerpc/perf/callchain_32.c
>
> which seems to have some vdso knowledge.
>
> But x86 does something kind of like that at signal frame generation
> time, and doesn't care.
>
> I really think it's an issue of "if you screw with the vdso, you get
> to keep both broken pieces".
>
>            Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 16:58 ` Jeff Xu
@ 2024-08-06  1:44   ` Oliver Sang
  2024-08-06 14:54     ` Jeff Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Sang @ 2024-08-06  1:44 UTC (permalink / raw)
  To: Jeff Xu
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Kees Cook,
	Liam R. Howlett, Pedro Falcato, Dave Hansen, Greg Kroah-Hartman,
	Guenter Roeck, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jorge Lucangeli Obes, Linus Torvalds, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin, oliver.sang

hi, Jeff,

On Mon, Aug 05, 2024 at 09:58:33AM -0700, Jeff Xu wrote:
> On Sun, Aug 4, 2024 at 1:59 AM kernel test robot <oliver.sang@intel.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on:
> >
> >
> > commit: 8be7258aad44b5e25977a98db136f677fa6f4370 ("mseal: add mseal syscall")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > testcase: stress-ng
> > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > parameters:
> >
> >         nr_threads: 100%
> >         testtime: 60s
> >         test: pagemove
> >         cpufreq_governor: performance
> >
> >
> > In addition to that, the commit also has significant impact on the following tests:
> >
> > +------------------+---------------------------------------------------------------------------------------------+
> > | testcase: change | stress-ng: stress-ng.pkey.ops_per_sec -3.6% regression                                      |
> > | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
> > | test parameters  | cpufreq_governor=performance                                                                |
> > |                  | nr_threads=100%                                                                             |
> > |                  | test=pkey                                                                                   |
> > |                  | testtime=60s                                                                                |
> > +------------------+---------------------------------------------------------------------------------------------+
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202408041602.caa0372-oliver.sang@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240804/202408041602.caa0372-oliver.sang@intel.com
> >
> There is an error when I try to reproduce the test:

what's your os? we support some distributions
https://github.com/intel/lkp-tests?tab=readme-ov-file#supported-distributions

> 
> bin/lkp install job.yaml
> 
> --------------------------------------------------------
> Some packages could not be installed. This may mean that you have
> requested an impossible situation or if you are using the unstable
> distribution that some required packages have not yet been created
> or been moved out of Incoming.
> The following information may help to resolve the situation:
> 
> The following packages have unmet dependencies:
>  libdw1 : Depends: libelf1 (= 0.190-1+b1)
>  libdw1t64 : Breaks: libdw1 (< 0.191-2)
> E: Unable to correct problems, you have held broken packages.
> Cannot install some packages of perf-c2c depends
> -----------------------------------------------------------------------------------------
> 
> And where is stress-ng.pagemove.page_remaps_per_sec test implemented,
> is that part of lkp-tests ?

stress-ng is in https://github.com/ColinIanKing/stress-ng

> 
> Thanks
> -Jeff
> 
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pagemove/stress-ng/60s
> >
> > commit:
> >   ff388fe5c4 ("mseal: wire up mseal syscall")
> >   8be7258aad ("mseal: add mseal syscall")
> >
> > ff388fe5c481d39c 8be7258aad44b5e25977a98db13
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >   41625945            -4.3%   39842322        proc-vmstat.numa_hit
> >   41559175            -4.3%   39774160        proc-vmstat.numa_local
> >   77484314            -4.4%   74105555        proc-vmstat.pgalloc_normal
> >   77205752            -4.4%   73826672        proc-vmstat.pgfree
> >   18361466            -4.2%   17596652        stress-ng.pagemove.ops
> >     306014            -4.2%     293262        stress-ng.pagemove.ops_per_sec
> >     205312            -4.4%     196176        stress-ng.pagemove.page_remaps_per_sec
> >       4961            +1.0%       5013        stress-ng.time.percent_of_cpu_this_job_got
> >       2917            +1.2%       2952        stress-ng.time.system_time
> >       1.07            -6.6%       1.00        perf-stat.i.MPKI
> >  3.354e+10            +3.5%  3.473e+10        perf-stat.i.branch-instructions
> >  1.795e+08            -4.2%  1.719e+08        perf-stat.i.cache-misses
> >  2.376e+08            -4.1%  2.279e+08        perf-stat.i.cache-references
> >       1.13            -3.0%       1.10        perf-stat.i.cpi
> >       1077            +4.3%       1124        perf-stat.i.cycles-between-cache-misses
> >  1.717e+11            +2.7%  1.762e+11        perf-stat.i.instructions
> >       0.88            +3.1%       0.91        perf-stat.i.ipc
> >       1.05            -6.8%       0.97        perf-stat.overall.MPKI
> >       0.25 ą  2%      -0.0        0.24        perf-stat.overall.branch-miss-rate%
> >       1.13            -3.0%       1.10        perf-stat.overall.cpi
> >       1084            +4.0%       1127        perf-stat.overall.cycles-between-cache-misses
> >       0.88            +3.1%       0.91        perf-stat.overall.ipc
> >  3.298e+10            +3.5%  3.415e+10        perf-stat.ps.branch-instructions
> >  1.764e+08            -4.3%  1.689e+08        perf-stat.ps.cache-misses
> >  2.336e+08            -4.1%   2.24e+08        perf-stat.ps.cache-references
> >     194.57            -2.4%     189.96 ą  2%  perf-stat.ps.cpu-migrations
> >  1.688e+11            +2.7%  1.733e+11        perf-stat.ps.instructions
> >  1.036e+13            +3.0%  1.068e+13        perf-stat.total.instructions
> >      75.12            -1.9       73.22        perf-profile.calltrace.cycles-pp.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> >      36.84            -1.6       35.29        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
> >      24.90            -1.2       23.72        perf-profile.calltrace.cycles-pp.copy_vma.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      19.89            -0.9       18.98        perf-profile.calltrace.cycles-pp.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> >      10.56 ą  2%      -0.8        9.78 ą  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
> >      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
> >      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> >      10.57 ą  2%      -0.8        9.80 ą  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> >      10.52 ą  2%      -0.8        9.75 ą  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
> >      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> >      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> >      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> >      14.75            -0.7       14.07        perf-profile.calltrace.cycles-pp.vma_merge.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> >       1.50            -0.6        0.94        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
> >       5.88 ą  2%      -0.4        5.47 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
> >       7.80            -0.3        7.47        perf-profile.calltrace.cycles-pp.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       4.55 ą  2%      -0.3        4.24 ą  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
> >       6.76            -0.3        6.45        perf-profile.calltrace.cycles-pp.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
> >       6.15            -0.3        5.86        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap
> >       8.22            -0.3        7.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> >       6.12            -0.3        5.87        perf-profile.calltrace.cycles-pp.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
> >       5.74            -0.2        5.50        perf-profile.calltrace.cycles-pp.move_ptes.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
> >       3.16 ą  2%      -0.2        2.94        perf-profile.calltrace.cycles-pp.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
> >       5.50            -0.2        5.28        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap
> >       1.36            -0.2        1.14        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
> >       5.15            -0.2        4.94        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap
> >       5.51            -0.2        5.31        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_merge.copy_vma.move_vma.__do_sys_mremap
> >       5.16            -0.2        4.97        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma.move_vma
> >       2.24            -0.2        2.05        perf-profile.calltrace.cycles-pp.find_vma_prev.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> >       2.60 ą  2%      -0.2        2.42 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs
> >       4.67            -0.2        4.49        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma
> >       3.41            -0.2        3.23        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
> >       3.00            -0.2        2.83 ą  2%  perf-profile.calltrace.cycles-pp.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
> >       0.96            -0.2        0.80        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.do_munmap.mremap_to
> >       4.04            -0.2        3.88        perf-profile.calltrace.cycles-pp.vm_area_dup.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> >       3.20 ą  2%      -0.2        3.04 ą  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
> >       3.53            -0.1        3.38        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma.__do_sys_mremap
> >       3.40            -0.1        3.26        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma.__do_sys_mremap
> >       2.20 ą  2%      -0.1        2.06 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap
> >       1.84 ą  3%      -0.1        1.71 ą  3%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap
> >       1.78 ą  2%      -0.1        1.65 ą  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_merge.copy_vma.move_vma.__do_sys_mremap
> >       2.69            -0.1        2.56        perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap
> >       1.78 ą  2%      -0.1        1.66 ą  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
> >       1.36 ą  2%      -0.1        1.23 ą  2%  perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
> >       0.95            -0.1        0.83        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap
> >       3.29            -0.1        3.17        perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> >       2.08            -0.1        1.96        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_merge.copy_vma.move_vma.__do_sys_mremap
> >       1.43 ą  3%      -0.1        1.32 ą  3%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_merge.copy_vma.move_vma
> >       2.21            -0.1        2.10        perf-profile.calltrace.cycles-pp.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
> >       2.47            -0.1        2.36        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma
> >       2.21            -0.1        2.12        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables
> >       1.41            -0.1        1.32        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap
> >       1.26            -0.1        1.18        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap
> >       1.82            -0.1        1.75        perf-profile.calltrace.cycles-pp.vma_link.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> >       0.71            -0.1        0.63        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge.copy_vma.move_vma.__do_sys_mremap
> >       1.29            -0.1        1.22        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
> >       0.61            -0.1        0.54        perf-profile.calltrace.cycles-pp.mas_prev_setup.mas_prev.vma_merge.copy_vma.move_vma
> >       1.36            -0.1        1.29        perf-profile.calltrace.cycles-pp.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap
> >       1.40            -0.1        1.33        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma.move_vma
> >       0.70            -0.1        0.64        perf-profile.calltrace.cycles-pp.mas_walk.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
> >       1.23            -0.1        1.17        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma
> >       1.66            -0.1        1.60        perf-profile.calltrace.cycles-pp.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       1.16            -0.1        1.10        perf-profile.calltrace.cycles-pp.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> >       0.96            -0.1        0.90        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region
> >       1.14            -0.1        1.08        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
> >       0.79            -0.1        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
> >       1.04            -0.1        1.00        perf-profile.calltrace.cycles-pp.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.58            -0.0        0.53        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.mremap
> >       0.61            -0.0        0.56        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
> >       0.56            -0.0        0.52        perf-profile.calltrace.cycles-pp.mas_find.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
> >       0.57            -0.0        0.53 ą  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
> >       0.78            -0.0        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
> >       0.88            -0.0        0.84        perf-profile.calltrace.cycles-pp.mtree_load.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64
> >       0.70            -0.0        0.66        perf-profile.calltrace.cycles-pp.__call_rcu_common.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> >       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> >       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.mremap
> >       0.97            -0.0        0.93        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> >       1.11            -0.0        1.08        perf-profile.calltrace.cycles-pp.clear_bhb_loop.mremap
> >       0.75            -0.0        0.72        perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
> >       0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap
> >       0.60 ą  2%      -0.0        0.57        perf-profile.calltrace.cycles-pp.security_mmap_addr.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64
> >       0.67 ą  2%      -0.0        0.64        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma
> >       0.82            -0.0        0.79        perf-profile.calltrace.cycles-pp.mtree_load.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> >       0.63            -0.0        0.60        perf-profile.calltrace.cycles-pp.mas_prev_slot.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> >       0.99            -0.0        0.96        perf-profile.calltrace.cycles-pp.mt_find.vma_merge.copy_vma.move_vma.__do_sys_mremap
> >       0.62 ą  2%      -0.0        0.59        perf-profile.calltrace.cycles-pp.get_old_pud.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
> >       0.87            -0.0        0.84        perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> >       0.78            -0.0        0.75        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_link.copy_vma.move_vma.__do_sys_mremap
> >       0.64            -0.0        0.62        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_link.copy_vma.move_vma.__do_sys_mremap
> >       0.90            -0.0        0.87        perf-profile.calltrace.cycles-pp.mtree_load.vma_merge.copy_vma.move_vma.__do_sys_mremap
> >       0.54            -0.0        0.52        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.move_ptes.move_page_tables.move_vma.__do_sys_mremap
> >       1.04            +0.0        1.08        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
> >       0.76            +0.1        0.83        perf-profile.calltrace.cycles-pp.__madvise
> >       0.63            +0.1        0.70        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> >       0.62            +0.1        0.70        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> >       0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
> >       0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> >      87.74            +0.7       88.45        perf-profile.calltrace.cycles-pp.mremap
> >       0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.do_munmap
> >       0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.mremap_to.__do_sys_mremap
> >      84.88            +0.9       85.77        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mremap
> >      84.73            +0.9       85.62        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> >       0.00            +0.9        0.92 ą  2%  perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.move_vma
> >      83.84            +0.9       84.78        perf-profile.calltrace.cycles-pp.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> >       0.00            +1.1        1.06        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64
> >       0.00            +1.2        1.21        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to
> >       2.07            +1.5        3.55        perf-profile.calltrace.cycles-pp.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       1.58            +1.5        3.07        perf-profile.calltrace.cycles-pp.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64
> >       0.00            +1.5        1.52        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap
> >       0.00            +1.6        1.57        perf-profile.calltrace.cycles-pp.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00            +1.7        1.72        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
> >       0.00            +2.0        2.01        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
> >       5.39            +2.9        8.32        perf-profile.calltrace.cycles-pp.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> >      75.29            -1.9       73.37        perf-profile.children.cycles-pp.move_vma
> >      37.06            -1.6       35.50        perf-profile.children.cycles-pp.do_vmi_align_munmap
> >      24.98            -1.2       23.80        perf-profile.children.cycles-pp.copy_vma
> >      19.99            -1.0       19.02        perf-profile.children.cycles-pp.handle_softirqs
> >      19.97            -1.0       19.00        perf-profile.children.cycles-pp.rcu_core
> >      19.95            -1.0       18.98        perf-profile.children.cycles-pp.rcu_do_batch
> >      19.98            -0.9       19.06        perf-profile.children.cycles-pp.__split_vma
> >      17.55            -0.8       16.76        perf-profile.children.cycles-pp.kmem_cache_free
> >      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.children.cycles-pp.run_ksoftirqd
> >      10.57 ą  2%      -0.8        9.80 ą  2%  perf-profile.children.cycles-pp.smpboot_thread_fn
> >      15.38            -0.8       14.62        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
> >      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.children.cycles-pp.kthread
> >      10.62 ą  2%      -0.8        9.86 ą  2%  perf-profile.children.cycles-pp.ret_from_fork
> >      10.62 ą  2%      -0.8        9.86 ą  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
> >      15.14            -0.7       14.44        perf-profile.children.cycles-pp.vma_merge
> >      12.08            -0.5       11.55        perf-profile.children.cycles-pp.__slab_free
> >      12.11            -0.5       11.62        perf-profile.children.cycles-pp.mas_wr_store_entry
> >      10.86            -0.5       10.39        perf-profile.children.cycles-pp.vm_area_dup
> >      11.89            -0.5       11.44        perf-profile.children.cycles-pp.mas_store_prealloc
> >       8.49            -0.4        8.06        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
> >       9.88            -0.4        9.49        perf-profile.children.cycles-pp.mas_wr_node_store
> >       7.91            -0.3        7.58        perf-profile.children.cycles-pp.move_page_tables
> >       6.06            -0.3        5.78        perf-profile.children.cycles-pp.vm_area_free_rcu_cb
> >       8.28            -0.3        8.00        perf-profile.children.cycles-pp.unmap_region
> >       6.69            -0.3        6.42        perf-profile.children.cycles-pp.vma_complete
> >       5.06            -0.3        4.80        perf-profile.children.cycles-pp.mas_preallocate
> >       5.82            -0.2        5.57        perf-profile.children.cycles-pp.move_ptes
> >       4.24            -0.2        4.01        perf-profile.children.cycles-pp.anon_vma_clone
> >       3.50            -0.2        3.30        perf-profile.children.cycles-pp.down_write
> >       2.44            -0.2        2.25        perf-profile.children.cycles-pp.find_vma_prev
> >       3.46            -0.2        3.28        perf-profile.children.cycles-pp.___slab_alloc
> >       3.45            -0.2        3.27        perf-profile.children.cycles-pp.free_pgtables
> >       2.54            -0.2        2.37        perf-profile.children.cycles-pp.rcu_cblist_dequeue
> >       3.35            -0.2        3.18        perf-profile.children.cycles-pp.__memcg_slab_free_hook
> >       2.93            -0.2        2.78        perf-profile.children.cycles-pp.mas_alloc_nodes
> >       2.28 ą  2%      -0.2        2.12 ą  2%  perf-profile.children.cycles-pp.vma_prepare
> >       3.46            -0.1        3.32        perf-profile.children.cycles-pp.flush_tlb_mm_range
> >       3.41            -0.1        3.27 ą  2%  perf-profile.children.cycles-pp.mod_objcg_state
> >       2.76            -0.1        2.63        perf-profile.children.cycles-pp.unlink_anon_vmas
> >       3.41            -0.1        3.28        perf-profile.children.cycles-pp.mas_store_gfp
> >       2.21            -0.1        2.09        perf-profile.children.cycles-pp.__cond_resched
> >       2.04            -0.1        1.94        perf-profile.children.cycles-pp.allocate_slab
> >       2.10            -0.1        2.00        perf-profile.children.cycles-pp.__call_rcu_common
> >       2.51            -0.1        2.40        perf-profile.children.cycles-pp.flush_tlb_func
> >       1.04            -0.1        0.94        perf-profile.children.cycles-pp.mas_prev
> >       2.71            -0.1        2.61        perf-profile.children.cycles-pp.mtree_load
> >       2.23            -0.1        2.14        perf-profile.children.cycles-pp.native_flush_tlb_one_user
> >       0.22 ą  5%      -0.1        0.13 ą 13%  perf-profile.children.cycles-pp.vm_stat_account
> >       0.95            -0.1        0.87        perf-profile.children.cycles-pp.mas_prev_setup
> >       1.65            -0.1        1.57        perf-profile.children.cycles-pp.mas_wr_walk
> >       1.84            -0.1        1.76        perf-profile.children.cycles-pp.up_write
> >       1.27            -0.1        1.20        perf-profile.children.cycles-pp.mas_prev_slot
> >       1.84            -0.1        1.77        perf-profile.children.cycles-pp.vma_link
> >       1.39            -0.1        1.32        perf-profile.children.cycles-pp.shuffle_freelist
> >       0.96            -0.1        0.90 ą  2%  perf-profile.children.cycles-pp.rcu_all_qs
> >       0.86            -0.1        0.80        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> >       1.70            -0.1        1.64        perf-profile.children.cycles-pp.__get_unmapped_area
> >       0.34 ą  3%      -0.1        0.29 ą  5%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
> >       0.60            -0.0        0.55        perf-profile.children.cycles-pp.entry_SYSCALL_64
> >       0.92            -0.0        0.87        perf-profile.children.cycles-pp.percpu_counter_add_batch
> >       1.07            -0.0        1.02        perf-profile.children.cycles-pp.vma_to_resize
> >       1.59            -0.0        1.54        perf-profile.children.cycles-pp.mas_update_gap
> >       0.44 ą  2%      -0.0        0.40 ą  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> >       0.70            -0.0        0.66        perf-profile.children.cycles-pp.syscall_return_via_sysret
> >       1.13            -0.0        1.09        perf-profile.children.cycles-pp.mt_find
> >       0.20 ą  6%      -0.0        0.17 ą  9%  perf-profile.children.cycles-pp.cap_vm_enough_memory
> >       0.99            -0.0        0.95        perf-profile.children.cycles-pp.mas_pop_node
> >       0.63 ą  2%      -0.0        0.59        perf-profile.children.cycles-pp.security_mmap_addr
> >       0.62            -0.0        0.59        perf-profile.children.cycles-pp.__put_partials
> >       1.17            -0.0        1.14        perf-profile.children.cycles-pp.clear_bhb_loop
> >       0.46            -0.0        0.43 ą  2%  perf-profile.children.cycles-pp.__alloc_pages_noprof
> >       0.44            -0.0        0.41 ą  2%  perf-profile.children.cycles-pp.get_page_from_freelist
> >       0.90            -0.0        0.87        perf-profile.children.cycles-pp.userfaultfd_unmap_complete
> >       0.64 ą  2%      -0.0        0.62        perf-profile.children.cycles-pp.get_old_pud
> >       1.07            -0.0        1.05        perf-profile.children.cycles-pp.mas_leaf_max_gap
> >       0.22 ą  3%      -0.0        0.20 ą  2%  perf-profile.children.cycles-pp.__rmqueue_pcplist
> >       0.55            -0.0        0.53        perf-profile.children.cycles-pp.refill_obj_stock
> >       0.25            -0.0        0.23 ą  3%  perf-profile.children.cycles-pp.rmqueue
> >       0.48            -0.0        0.45        perf-profile.children.cycles-pp.mremap_userfaultfd_prep
> >       0.33            -0.0        0.30        perf-profile.children.cycles-pp.free_unref_page
> >       0.46            -0.0        0.44        perf-profile.children.cycles-pp.setup_object
> >       0.21 ą  3%      -0.0        0.19 ą  2%  perf-profile.children.cycles-pp.rmqueue_bulk
> >       0.31 ą  3%      -0.0        0.29        perf-profile.children.cycles-pp.__vm_enough_memory
> >       0.40            -0.0        0.38        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       0.36            -0.0        0.35        perf-profile.children.cycles-pp.madvise_vma_behavior
> >       0.54            -0.0        0.53 ą  2%  perf-profile.children.cycles-pp.mas_wr_end_piv
> >       0.46            -0.0        0.44 ą  2%  perf-profile.children.cycles-pp.rcu_segcblist_enqueue
> >       0.34            -0.0        0.32 ą  2%  perf-profile.children.cycles-pp.mas_destroy
> >       0.28            -0.0        0.26 ą  3%  perf-profile.children.cycles-pp.mas_wr_store_setup
> >       0.30            -0.0        0.28        perf-profile.children.cycles-pp.pte_offset_map_nolock
> >       0.19            -0.0        0.18 ą  2%  perf-profile.children.cycles-pp.__thp_vma_allowable_orders
> >       0.08 ą  4%      -0.0        0.07        perf-profile.children.cycles-pp.ksm_madvise
> >       0.17            -0.0        0.16        perf-profile.children.cycles-pp.get_any_partial
> >       0.08            -0.0        0.07        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> >       0.45            +0.0        0.47        perf-profile.children.cycles-pp._raw_spin_lock
> >       1.10            +0.0        1.14        perf-profile.children.cycles-pp.zap_pte_range
> >       0.78            +0.1        0.85        perf-profile.children.cycles-pp.__madvise
> >       0.63            +0.1        0.70        perf-profile.children.cycles-pp.__x64_sys_madvise
> >       0.62            +0.1        0.70        perf-profile.children.cycles-pp.do_madvise
> >       0.00            +0.1        0.09 ą  4%  perf-profile.children.cycles-pp.can_modify_mm_madv
> >       1.32            +0.1        1.46        perf-profile.children.cycles-pp.mas_next_slot
> >      88.13            +0.7       88.83        perf-profile.children.cycles-pp.mremap
> >      83.94            +0.9       84.88        perf-profile.children.cycles-pp.__do_sys_mremap
> >      86.06            +0.9       87.00        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >      85.56            +1.0       86.54        perf-profile.children.cycles-pp.do_syscall_64
> >      40.49            +1.4       41.90        perf-profile.children.cycles-pp.do_vmi_munmap
> >       2.10            +1.5        3.57        perf-profile.children.cycles-pp.do_munmap
> >       3.62            +2.3        5.90        perf-profile.children.cycles-pp.mas_walk
> >       5.44            +2.9        8.38        perf-profile.children.cycles-pp.mremap_to
> >       5.30            +3.1        8.39        perf-profile.children.cycles-pp.mas_find
> >       0.00            +5.4        5.40        perf-profile.children.cycles-pp.can_modify_mm
> >      11.46            -0.5       10.96        perf-profile.self.cycles-pp.__slab_free
> >       4.30            -0.2        4.08        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> >       2.51            -0.2        2.34        perf-profile.self.cycles-pp.rcu_cblist_dequeue
> >       2.41 ą  2%      -0.2        2.25        perf-profile.self.cycles-pp.down_write
> >       2.21            -0.1        2.11        perf-profile.self.cycles-pp.native_flush_tlb_one_user
> >       2.37            -0.1        2.28        perf-profile.self.cycles-pp.mtree_load
> >       1.60            -0.1        1.51        perf-profile.self.cycles-pp.__memcg_slab_free_hook
> >       0.18 ą  3%      -0.1        0.10 ą 15%  perf-profile.self.cycles-pp.vm_stat_account
> >       1.25            -0.1        1.18        perf-profile.self.cycles-pp.move_vma
> >       1.76            -0.1        1.69        perf-profile.self.cycles-pp.mod_objcg_state
> >       1.42            -0.1        1.35 ą  2%  perf-profile.self.cycles-pp.__call_rcu_common
> >       1.41            -0.1        1.34        perf-profile.self.cycles-pp.mas_wr_walk
> >       1.52            -0.1        1.46        perf-profile.self.cycles-pp.up_write
> >       1.02            -0.1        0.95        perf-profile.self.cycles-pp.mas_prev_slot
> >       0.96            -0.1        0.90 ą  2%  perf-profile.self.cycles-pp.vm_area_free_rcu_cb
> >       1.50            -0.1        1.45        perf-profile.self.cycles-pp.kmem_cache_free
> >       0.69 ą  3%      -0.1        0.64 ą  2%  perf-profile.self.cycles-pp.rcu_all_qs
> >       1.14 ą  2%      -0.1        1.09        perf-profile.self.cycles-pp.shuffle_freelist
> >       1.10            -0.1        1.05        perf-profile.self.cycles-pp.__cond_resched
> >       1.40            -0.0        1.35        perf-profile.self.cycles-pp.do_vmi_align_munmap
> >       0.99            -0.0        0.94        perf-profile.self.cycles-pp.mas_preallocate
> >       0.88            -0.0        0.83        perf-profile.self.cycles-pp.___slab_alloc
> >       0.55            -0.0        0.50        perf-profile.self.cycles-pp.mremap_to
> >       0.98            -0.0        0.93        perf-profile.self.cycles-pp.move_ptes
> >       0.78            -0.0        0.74        perf-profile.self.cycles-pp.percpu_counter_add_batch
> >       0.21 ą  2%      -0.0        0.18 ą  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
> >       0.44 ą  2%      -0.0        0.40 ą  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >       0.92            -0.0        0.89        perf-profile.self.cycles-pp.mas_store_gfp
> >       0.86            -0.0        0.82        perf-profile.self.cycles-pp.mas_pop_node
> >       0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       1.15            -0.0        1.12        perf-profile.self.cycles-pp.clear_bhb_loop
> >       1.14            -0.0        1.11        perf-profile.self.cycles-pp.vma_merge
> >       0.66            -0.0        0.63        perf-profile.self.cycles-pp.__split_vma
> >       0.16 ą  6%      -0.0        0.13 ą  7%  perf-profile.self.cycles-pp.cap_vm_enough_memory
> >       0.82            -0.0        0.79        perf-profile.self.cycles-pp.mas_wr_store_entry
> >       0.54 ą  2%      -0.0        0.52        perf-profile.self.cycles-pp.get_old_pud
> >       0.43            -0.0        0.40        perf-profile.self.cycles-pp.do_munmap
> >       0.51 ą  2%      -0.0        0.48 ą  2%  perf-profile.self.cycles-pp.security_mmap_addr
> >       0.50            -0.0        0.48        perf-profile.self.cycles-pp.refill_obj_stock
> >       0.24            -0.0        0.22        perf-profile.self.cycles-pp.mas_prev
> >       0.71            -0.0        0.69        perf-profile.self.cycles-pp.unmap_page_range
> >       0.48            -0.0        0.45        perf-profile.self.cycles-pp.find_vma_prev
> >       0.42            -0.0        0.40        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> >       0.66            -0.0        0.64        perf-profile.self.cycles-pp.mas_store_prealloc
> >       0.31            -0.0        0.29        perf-profile.self.cycles-pp.mas_prev_setup
> >       0.43            -0.0        0.41        perf-profile.self.cycles-pp.mas_wr_end_piv
> >       0.78            -0.0        0.76        perf-profile.self.cycles-pp.userfaultfd_unmap_complete
> >       0.28            -0.0        0.26 ą  2%  perf-profile.self.cycles-pp.mas_put_in_tree
> >       0.42            -0.0        0.40        perf-profile.self.cycles-pp.mremap_userfaultfd_prep
> >       0.28            -0.0        0.26        perf-profile.self.cycles-pp.free_pgtables
> >       0.39            -0.0        0.37        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       0.30 ą  2%      -0.0        0.28        perf-profile.self.cycles-pp.zap_pmd_range
> >       0.32            -0.0        0.31        perf-profile.self.cycles-pp.unmap_vmas
> >       0.21            -0.0        0.20        perf-profile.self.cycles-pp.__get_unmapped_area
> >       0.18 ą  2%      -0.0        0.17 ą  2%  perf-profile.self.cycles-pp.lru_add_drain_cpu
> >       0.06            -0.0        0.05        perf-profile.self.cycles-pp.ksm_madvise
> >       0.45            +0.0        0.46        perf-profile.self.cycles-pp.do_vmi_munmap
> >       0.37            +0.0        0.39        perf-profile.self.cycles-pp._raw_spin_lock
> >       1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
> >       1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
> >       0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
> >       3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk
> >
> >
> > ***************************************************************************************************
> > lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pkey/stress-ng/60s
> >
> > commit:
> >   ff388fe5c4 ("mseal: wire up mseal syscall")
> >   8be7258aad ("mseal: add mseal syscall")
> >
> > ff388fe5c481d39c 8be7258aad44b5e25977a98db13
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >      10539            -2.5%      10273        vmstat.system.cs
> >       0.28 ą  5%     -20.1%       0.22 ą  7%  sched_debug.cfs_rq:/.h_nr_running.stddev
> >       1419 ą  7%     -15.3%       1202 ą  6%  sched_debug.cfs_rq:/.util_avg.max
> >       0.28 ą  6%     -18.4%       0.23 ą  8%  sched_debug.cpu.nr_running.stddev
> >  8.736e+08            -3.6%  8.423e+08        stress-ng.pkey.ops
> >   14560560            -3.6%   14038795        stress-ng.pkey.ops_per_sec
> >     770.39 ą  4%      -5.0%     732.04        stress-ng.time.user_time
> >     244657 ą  3%      +5.8%     258782 ą  3%  proc-vmstat.nr_slab_unreclaimable
> >   73133541            -2.1%   71588873        proc-vmstat.numa_hit
> >   72873579            -2.1%   71357274        proc-vmstat.numa_local
> >  1.842e+08            -2.5%  1.796e+08        proc-vmstat.pgalloc_normal
> >  1.767e+08            -2.8%  1.717e+08        proc-vmstat.pgfree
> >    1345346 ą 40%     -73.1%     362064 ą124%  numa-vmstat.node0.nr_inactive_anon
> >    1345340 ą 40%     -73.1%     362062 ą124%  numa-vmstat.node0.nr_zone_inactive_anon
> >    2420830 ą 14%     +35.1%    3270248 ą 16%  numa-vmstat.node1.nr_file_pages
> >    2067871 ą 13%     +51.5%    3132982 ą 17%  numa-vmstat.node1.nr_inactive_anon
> >     191406 ą 17%     +33.6%     255808 ą 14%  numa-vmstat.node1.nr_mapped
> >       2452 ą 61%    +104.4%       5012 ą 35%  numa-vmstat.node1.nr_page_table_pages
> >    2067853 ą 13%     +51.5%    3132966 ą 17%  numa-vmstat.node1.nr_zone_inactive_anon
> >    5379238 ą 40%     -73.0%    1453605 ą123%  numa-meminfo.node0.Inactive
> >    5379166 ą 40%     -73.0%    1453462 ą123%  numa-meminfo.node0.Inactive(anon)
> >    8741077 ą 22%     -36.7%    5531290 ą 28%  numa-meminfo.node0.MemUsed
> >    9651902 ą 13%     +35.8%   13105318 ą 16%  numa-meminfo.node1.FilePages
> >    8239855 ą 13%     +52.4%   12556929 ą 17%  numa-meminfo.node1.Inactive
> >    8239712 ą 13%     +52.4%   12556853 ą 17%  numa-meminfo.node1.Inactive(anon)
> >     761944 ą 18%     +34.6%    1025906 ą 14%  numa-meminfo.node1.Mapped
> >   11679628 ą 11%     +31.2%   15322841 ą 14%  numa-meminfo.node1.MemUsed
> >       9874 ą 62%    +104.6%      20200 ą 36%  numa-meminfo.node1.PageTables
> >       0.74            -4.2%       0.71        perf-stat.i.MPKI
> >  1.245e+11            +2.3%  1.274e+11        perf-stat.i.branch-instructions
> >       0.37            -0.0        0.35        perf-stat.i.branch-miss-rate%
> >  4.359e+08            -2.1%  4.265e+08        perf-stat.i.branch-misses
> >  4.672e+08            -2.6%  4.548e+08        perf-stat.i.cache-misses
> >  7.276e+08            -2.7%  7.082e+08        perf-stat.i.cache-references
> >       1.00            -1.6%       0.98        perf-stat.i.cpi
> >       1364            +2.9%       1404        perf-stat.i.cycles-between-cache-misses
> >  6.392e+11            +1.7%  6.499e+11        perf-stat.i.instructions
> >       1.00            +1.6%       1.02        perf-stat.i.ipc
> >       0.74            -4.3%       0.71        perf-stat.overall.MPKI
> >       0.35            -0.0        0.33        perf-stat.overall.branch-miss-rate%
> >       1.00            -1.6%       0.99        perf-stat.overall.cpi
> >       1356            +2.9%       1395        perf-stat.overall.cycles-between-cache-misses
> >       1.00            +1.6%       1.01        perf-stat.overall.ipc
> >  1.209e+11            +1.9%  1.232e+11        perf-stat.ps.branch-instructions
> >  4.188e+08            -2.6%  4.077e+08        perf-stat.ps.branch-misses
> >  4.585e+08            -3.1%  4.441e+08        perf-stat.ps.cache-misses
> >  7.124e+08            -3.1%  6.901e+08        perf-stat.ps.cache-references
> >      10321            -2.6%      10053        perf-stat.ps.context-switches
> >
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  0:13               ` Linus Torvalds
  2024-08-06  1:22                 ` Jeff Xu
@ 2024-08-06  2:01                 ` Michael Ellerman
  2024-08-06  2:15                   ` Linus Torvalds
  2024-09-13  5:47                   ` Christophe Leroy
  1 sibling, 2 replies; 29+ messages in thread
From: Michael Ellerman @ 2024-08-06  2:01 UTC (permalink / raw)
  To: Linus Torvalds, Nicholas Piggin
  Cc: Jeff Xu, Christophe Leroy, Pedro Falcato, kernel test robot,
	Jeff Xu, oe-lkp, lkp, linux-kernel, Andrew Morton, Kees Cook,
	Liam R. Howlett, Dave Hansen, Greg Kroah-Hartman, Guenter Roeck,
	Jann Horn, Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Mon, 5 Aug 2024 at 16:25, Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Can userspace on other archs not unmap their vdsos?
>
> I think they can, and nobody cares. The "context.vdso" value stays at
> some stale value, and anybody who tries to use it will just fail.
>
> So what makes powerpc special is not "you can unmap the vdso", but
> "powerpc cares".
>
> I just don't quite know _why_ powerpc cares.

AFAIK for CRIU the problem is signal delivery:

arch/powerpc/kernel/signal_64.c:

int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
		struct task_struct *tsk)
{
        ...
	/* Set up to return from userspace. */
	if (tsk->mm->context.vdso) {
		regs_set_return_ip(regs, VDSO64_SYMBOL(tsk->mm->context.vdso, sigtramp_rt64));


ie. if the VDSO is moved but mm->context.vdso is not updated, signal
delivery will crash in userspace.

x86-64 always uses SA_RESTORER, and arm64 & s390 can use SA_RESTORER, so
I think CRIU uses that to avoid problems with signal delivery when the
VDSO is moved.

riscv doesn't support SA_RESTORER but I guess CRIU doesn't support riscv
yet so it's not become a problem.

There was a patch to support SA_RESTORER on powerpc, but I balked at
merging it because I couldn't find anyone on the glibc side to say
whether they wanted it or not. I guess I should have just merged it.

There was an attempt to unify all the vdso stuff and handle the
VDSO mremap case in generic code:

  https://lore.kernel.org/lkml/20210611180242.711399-17-dima@arista.com/

But I think that series got a bit big and complicated and Dmitry had to
move on to other things.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 19:33         ` Linus Torvalds
@ 2024-08-06  2:14           ` Michael Ellerman
  2024-08-06  2:17             ` Linus Torvalds
  2024-08-06  6:04           ` Oliver Sang
  1 sibling, 1 reply; 29+ messages in thread
From: Michael Ellerman @ 2024-08-06  2:14 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Xu, Nicholas Piggin, Christophe Leroy
  Cc: Pedro Falcato, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Mon, 5 Aug 2024 at 11:55, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> So please consider this a "maybe something like this" patch, but that
>> 'arch_unmap()' really is pretty nasty
>
> Actually, the whole powerpc vdso code confused me. It's not the vvar
> thing that wants this close thing, it's the other ones that have the
> remap thing.
>
> .. and there were two of those error cases that needed to reset the
> vdso pointer.
>
> That all shows just how carefully I was reading this code.
>
> New version - still untested, but now I've read through it one more
> time - attached.

Needs a slight tweak to compile, vvar_close() needs to return void. And
should probably be renamed vdso_close(). Diff below if anyone else wants
to test it.

I'm testing it now, but it should do what we need.

cheers


diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 6fa041a6690a..431b46976db8 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -81,8 +81,8 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
 	return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
 }
 
-static int vvar_close(const struct vm_special_mapping *sm,
-		      struct vm_area_struct *vma)
+static void vdso_close(const struct vm_special_mapping *sm,
+                       struct vm_area_struct *vma)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	mm->context.vdso = NULL;
@@ -99,13 +99,13 @@ static struct vm_special_mapping vvar_spec __ro_after_init = {
 static struct vm_special_mapping vdso32_spec __ro_after_init = {
 	.name = "[vdso]",
 	.mremap = vdso32_mremap,
-	.close = vvar_close,
+	.close = vdso_close,
 };
 
 static struct vm_special_mapping vdso64_spec __ro_after_init = {
 	.name = "[vdso]",
 	.mremap = vdso64_mremap,
-	.close = vvar_close,
+	.close = vdso_close,
 };
 
 #ifdef CONFIG_TIME_NS

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  2:01                 ` Michael Ellerman
@ 2024-08-06  2:15                   ` Linus Torvalds
  2024-09-13  5:47                   ` Christophe Leroy
  1 sibling, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2024-08-06  2:15 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nicholas Piggin, Jeff Xu, Christophe Leroy, Pedro Falcato,
	kernel test robot, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

On Mon, 5 Aug 2024 at 19:01, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> >
> > I just don't quite know _why_ powerpc cares.
>
> AFAIK for CRIU the problem is signal delivery:

Hmm. Well, the patch I sent out should keep it all working.

In fact, to some degree it would make it much more straightforward for
other architectures to do the same thing.

Instead of a random "arch_munmap()" hack, it's a fairly reasonable
_install_special_mapping() extension.

That said, the *other* thing I don't really understand is the strange
"we have to set the context.vdso value before calling
install_special_mapping":

        /*
         * Put vDSO base into mm struct. We need to do this before calling
         * install_special_mapping or the perf counter mmap tracking code
         * will fail to recognise it as a vDSO.
         */

and that looks odd too.

Anyway, I wish we could just get rid of all the horrible signal restore crap.

We used to just put it in the stack, and that worked really well apart
from the whole WX thing.

I wonder if we should just go back to that, and turn the resulting
"page fault due to non-executable stack" into a "sigreturn system
call".

And yes, SA_RESTORER is the right thing. It's basically just user
space telling us where it is. And happily, on x86-64 we just forced
the issue, and we do

        /* x86-64 should always use SA_RESTORER. */
        if (!(ksig->ka.sa.sa_flags & SA_RESTORER))
                return -EFAULT;

so you literally cannot have signals without it.

             Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  2:14           ` Michael Ellerman
@ 2024-08-06  2:17             ` Linus Torvalds
  2024-08-06 12:03               ` Michael Ellerman
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2024-08-06  2:17 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Jeff Xu, Nicholas Piggin, Christophe Leroy, Pedro Falcato,
	kernel test robot, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

On Mon, 5 Aug 2024 at 19:14, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Needs a slight tweak to compile, vvar_close() needs to return void.

Ack, shows just how untested it was.

> And should probably be renamed vdso_close().

.. and that was due to the initial confusion that I then fixed, but
didn't fix the naming.

So yes, those fixes look ObviouslyCorrect(tm).

           Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-05 19:33         ` Linus Torvalds
  2024-08-06  2:14           ` Michael Ellerman
@ 2024-08-06  6:04           ` Oliver Sang
  2024-08-06 14:38             ` Linus Torvalds
  2024-08-06 21:37             ` Pedro Falcato
  1 sibling, 2 replies; 29+ messages in thread
From: Oliver Sang @ 2024-08-06  6:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Xu, Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Pedro Falcato, Jeff Xu, oe-lkp, lkp, linux-kernel, Andrew Morton,
	Kees Cook, Liam R. Howlett, Dave Hansen, Greg Kroah-Hartman,
	Guenter Roeck, Jann Horn, Jonathan Corbet, Jorge Lucangeli Obes,
	Matthew Wilcox, Muhammad Usama Anjum, Stephen Röttger,
	Suren Baghdasaryan, Amer Al Shanawany, Javier Carrasco,
	Shuah Khan, linux-api, linux-mm, ying.huang, feng.tang,
	fengwei.yin, oliver.sang

hi, Linus,

On Mon, Aug 05, 2024 at 12:33:58PM -0700, Linus Torvalds wrote:
> On Mon, 5 Aug 2024 at 11:55, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So please consider this a "maybe something like this" patch, but that
> > 'arch_unmap()' really is pretty nasty
> 
> Actually, the whole powerpc vdso code confused me. It's not the vvar
> thing that wants this close thing, it's the other ones that have the
> remap thing.
> 
> .. and there were two of those error cases that needed to reset the
> vdso pointer.
> 
> That all shows just how carefully I was reading this code.
> 
> New version - still untested, but now I've read through it one more
> time - attached.

we tested this version by applying it directly upon 8be7258aad,  but seems it
have little impact to performance. still similar regression if comparing to
ff388fe5c4.

(the data for 8be7258aad and ff388fe5c4 are a little different with what we
have in previous report, since we rerun tests by gcc-12 compiler. 0day team
change back to gcc-12 from gcc-13 recently due to some issues)

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pagemove/stress-ng/60s

commit:
  ff388fe5c4 ("mseal: wire up mseal syscall")
  8be7258aad ("mseal: add mseal syscall")
  4605212a16  <--- your patch

ff388fe5c481d39c 8be7258aad44b5e25977a98db13 4605212a162071afdd9c713e936
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
      4958            +1.3%       5024            +1.2%       5020        time.percent_of_cpu_this_job_got
      2916            +1.5%       2960            +1.4%       2957        time.system_time
     65.85            -7.0%      61.27            -7.0%      61.23        time.user_time
  41535129            -4.5%   39669773            -4.3%   39746835        proc-vmstat.numa_hit
  41465484            -4.5%   39602956            -4.3%   39677556        proc-vmstat.numa_local
  77303973            -4.6%   73780662            -4.4%   73912128        proc-vmstat.pgalloc_normal
  77022096            -4.6%   73502058            -4.4%   73637326        proc-vmstat.pgfree
  18381956            -4.9%   17473438            -5.0%   17457167        stress-ng.pagemove.ops
    306349            -4.9%     291188            -5.0%     290931        stress-ng.pagemove.ops_per_sec
    209930            -6.2%     196996 ±  2%      -7.6%     193911        stress-ng.pagemove.page_remaps_per_sec
      4958            +1.3%       5024            +1.2%       5020        stress-ng.time.percent_of_cpu_this_job_got
      2916            +1.5%       2960            +1.4%       2957        stress-ng.time.system_time
      1.13            -2.1%       1.10            -2.2%       1.10        perf-stat.i.cpi
      0.89            +2.2%       0.91            +2.1%       0.91        perf-stat.i.ipc
      1.04            -7.2%       0.97            -7.1%       0.97        perf-stat.overall.MPKI
      1.13            -2.3%       1.10            -2.2%       1.10        perf-stat.overall.cpi
      1082            +5.4%       1140            +5.3%       1139        perf-stat.overall.cycles-between-cache-misses
      0.89            +2.3%       0.91            +2.3%       0.91        perf-stat.overall.ipc
    192.79            -3.9%     185.32 ±  2%      -2.4%     188.21 ±  3%  perf-stat.ps.cpu-migrations
 1.048e+13            +2.8%  1.078e+13            +2.6%  1.075e+13        perf-stat.total.instructions
     74.97            -1.9       73.07            -2.1       72.88        perf-profile.calltrace.cycles-pp.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
     36.79            -1.6       35.22            -1.6       35.17        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
     24.98            -1.3       23.64            -1.4       23.57        perf-profile.calltrace.cycles-pp.copy_vma.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     19.91            -1.1       18.85            -1.1       18.83        perf-profile.calltrace.cycles-pp.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.6       10.06 ±  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.6       10.06 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.6       10.06 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
     10.64 ±  3%      -0.9        9.79 ±  3%      -0.6       10.02 ±  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     10.63 ±  3%      -0.9        9.78 ±  3%      -0.6       10.01 ±  2%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
     10.63 ±  3%      -0.9        9.78 ±  3%      -0.6       10.01 ±  2%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     10.63 ±  3%      -0.9        9.78 ±  3%      -0.6       10.01 ±  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
     10.59 ±  3%      -0.8        9.74 ±  3%      -0.6        9.97 ±  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
     14.77            -0.8       14.00            -0.9       13.91        perf-profile.calltrace.cycles-pp.vma_merge.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      1.48            -0.5        0.99            -0.5        0.99        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
      5.95 ±  3%      -0.5        5.47 ±  3%      -0.4        5.59 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      7.88            -0.4        7.48            -0.4        7.44        perf-profile.calltrace.cycles-pp.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.62 ±  3%      -0.4        4.25 ±  3%      -0.3        4.35 ±  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
      6.72            -0.4        6.36            -0.3        6.39        perf-profile.calltrace.cycles-pp.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      6.15            -0.3        5.82            -0.4        5.80        perf-profile.calltrace.cycles-pp.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      6.11            -0.3        5.78            -0.3        5.82        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      5.78            -0.3        5.49            -0.3        5.46        perf-profile.calltrace.cycles-pp.move_ptes.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
      5.54            -0.3        5.25            -0.3        5.22        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      5.56            -0.3        5.28            -0.3        5.24        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_merge.copy_vma.move_vma.__do_sys_mremap
      5.19            -0.3        4.92            -0.3        4.89        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap
      5.20            -0.3        4.94            -0.3        4.91        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma.move_vma
      3.20 ±  4%      -0.3        2.94 ±  3%      -0.2        3.01 ±  2%  perf-profile.calltrace.cycles-pp.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      4.09            -0.2        3.85            -0.2        3.85        perf-profile.calltrace.cycles-pp.vm_area_dup.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      4.68            -0.2        4.45            -0.3        4.41        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma
      2.63 ±  3%      -0.2        2.42 ±  3%      -0.2        2.48 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs
      2.36 ±  2%      -0.2        2.16 ±  4%      -0.2        2.17 ±  2%  perf-profile.calltrace.cycles-pp.__memcpy.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_complete
      3.56            -0.2        3.36            -0.2        3.37        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma.__do_sys_mremap
      4.00            -0.2        3.81            -0.2        3.78        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma
      1.35            -0.2        1.16            -0.2        1.17        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
      3.40            -0.2        3.22            -0.2        3.21        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma.__do_sys_mremap
      2.22            -0.2        2.06            -0.2        2.05        perf-profile.calltrace.cycles-pp.find_vma_prev.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      0.96            -0.2        0.82            -0.1        0.82        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.do_munmap.mremap_to
      3.25            -0.1        3.10            -0.2        3.10        perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      1.81 ±  4%      -0.1        1.67 ±  3%      -0.1        1.71 ±  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
      1.97 ±  3%      -0.1        1.83 ±  3%      -0.2        1.81 ±  4%  perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
      2.26            -0.1        2.12            -0.2        2.11        perf-profile.calltrace.cycles-pp.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      3.10            -0.1        2.96            -0.1        2.99        perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
      3.13            -0.1        2.99            -0.1        3.00        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
      2.97            -0.1        2.85            -0.2        2.82        perf-profile.calltrace.cycles-pp.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      2.05            -0.1        1.93            -0.1        1.92        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_merge.copy_vma.move_vma.__do_sys_mremap
      8.26            -0.1        8.14            -0.1        8.16        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      2.45            -0.1        2.34            -0.1        2.34        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma
      2.43            -0.1        2.32            -0.1        2.32        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.75 ±  2%      -0.1        1.64 ±  3%      -0.1        1.61 ±  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_merge.copy_vma.move_vma.__do_sys_mremap
      0.54            -0.1        0.44 ± 37%      -0.2        0.36 ± 63%  perf-profile.calltrace.cycles-pp.mas_find.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
      2.21            -0.1        2.11            -0.1        2.11        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables
      1.27 ±  2%      -0.1        1.16 ±  4%      -0.1        1.18 ±  2%  perf-profile.calltrace.cycles-pp.__memcpy.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge
      1.32 ±  3%      -0.1        1.22 ±  3%      -0.1        1.25        perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      1.85            -0.1        1.76            -0.1        1.76        perf-profile.calltrace.cycles-pp.vma_link.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      2.14 ±  2%      -0.1        2.05 ±  2%      -0.1        2.00 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      1.40            -0.1        1.31            -0.1        1.30        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      1.77 ±  3%      -0.1        1.68 ±  2%      -0.1        1.64 ±  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap
      1.39            -0.1        1.30            -0.1        1.30        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma.move_vma
      1.24            -0.1        1.16            -0.1        1.16        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap
      1.40 ±  3%      -0.1        1.32 ±  4%      -0.1        1.26 ±  5%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_merge.copy_vma.move_vma
      0.94            -0.1        0.86            -0.1        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap
      1.23            -0.1        1.15            -0.1        1.15        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma
      1.54            -0.1        1.46            -0.1        1.46        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
      0.73            -0.1        0.67            -0.1        0.67        perf-profile.calltrace.cycles-pp.mas_walk.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
      1.15            -0.1        1.09            -0.1        1.09        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
      0.60 ±  2%      -0.1        0.54            -0.1        0.54        perf-profile.calltrace.cycles-pp.security_mmap_addr.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64
      1.27            -0.1        1.21            -0.1        1.21        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
      0.72            -0.1        0.66            -0.1        0.65        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge.copy_vma.move_vma.__do_sys_mremap
      0.70 ±  2%      -0.1        0.64 ±  3%      -0.1        0.64 ±  4%  perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.copy_vma
      0.79            -0.1        0.73            -0.1        0.73        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
      0.80 ±  2%      -0.1        0.75            -0.1        0.74 ±  2%  perf-profile.calltrace.cycles-pp.__call_rcu_common.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge
      0.78            -0.1        0.72            -0.1        0.72        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
      1.02            -0.1        0.96            -0.1        0.96        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
      1.63            -0.1        1.58            -0.1        1.57        perf-profile.calltrace.cycles-pp.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.62            -0.0        0.58            -0.1        0.57        perf-profile.calltrace.cycles-pp.mas_prev_setup.mas_prev.vma_merge.copy_vma.move_vma
      0.60 ±  3%      -0.0        0.56 ±  3%      -0.0        0.57 ±  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
      0.86            -0.0        0.81            -0.0        0.81        perf-profile.calltrace.cycles-pp.mtree_load.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64
      0.67            -0.0        0.62            -0.0        0.63        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      1.02            -0.0        0.97            -0.0        0.97        perf-profile.calltrace.cycles-pp.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.76 ±  2%      -0.0        0.71            -0.0        0.72 ±  2%  perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
      0.70            -0.0        0.66            -0.0        0.66        perf-profile.calltrace.cycles-pp.__call_rcu_common.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.67 ±  2%      -0.0        0.63            -0.0        0.63        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.mremap
      0.81            -0.0        0.77            -0.0        0.77        perf-profile.calltrace.cycles-pp.mtree_load.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      0.56            -0.0        0.51            -0.1        0.44 ± 40%  perf-profile.calltrace.cycles-pp.mas_walk.mas_prev_setup.mas_prev.vma_merge.copy_vma
      0.98            -0.0        0.93            -0.0        0.93        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.69            -0.0        0.65            -0.0        0.65        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma
      0.78            -0.0        0.74            -0.0        0.74        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_link.copy_vma.move_vma.__do_sys_mremap
      1.12            -0.0        1.08            -0.0        1.07        perf-profile.calltrace.cycles-pp.clear_bhb_loop.mremap
      0.68            -0.0        0.65            -0.0        0.65        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_link.copy_vma.move_vma.__do_sys_mremap
      1.00            -0.0        0.97            -0.0        0.97        perf-profile.calltrace.cycles-pp.mt_find.vma_merge.copy_vma.move_vma.__do_sys_mremap
      0.62            -0.0        0.59            -0.0        0.59        perf-profile.calltrace.cycles-pp.mas_prev_slot.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.88            -0.0        0.85            -0.0        0.85        perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      1.15            -0.0        1.12            -0.0        1.14        perf-profile.calltrace.cycles-pp.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      0.60            -0.0        0.57 ±  2%      -0.0        0.57        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.59            -0.0        0.56            -0.0        0.55        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.mremap
      0.62 ±  2%      -0.0        0.59 ±  2%      -0.0        0.58        perf-profile.calltrace.cycles-pp.get_old_pud.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
      0.65            -0.0        0.63            -0.0        0.62        perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
      0.67            +0.1        0.74            +0.1        0.74        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
      0.76            +0.1        0.84            +0.1        0.83        perf-profile.calltrace.cycles-pp.__madvise
      0.66            +0.1        0.74            +0.1        0.74        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.63            +0.1        0.71            +0.1        0.71        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.62            +0.1        0.70            +0.1        0.70        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      3.47            +0.1        3.55            +0.1        3.56        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
     87.67            +0.8       88.47            +0.6       88.26        perf-profile.calltrace.cycles-pp.mremap
      0.00            +0.9        0.86            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.do_munmap
      0.00            +0.9        0.88            +0.9        0.88        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.mremap_to.__do_sys_mremap
      0.00            +0.9        0.90 ±  2%      +0.9        0.89        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.move_vma
     84.82            +1.0       85.80            +0.8       85.60        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mremap
     84.66            +1.0       85.65            +0.8       85.45        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
     83.71            +1.0       84.73            +0.8       84.55        perf-profile.calltrace.cycles-pp.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      0.00            +1.1        1.10            +1.1        1.10        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64
      0.00            +1.2        1.21            +1.2        1.22        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to
      2.09            +1.5        3.60            +1.5        3.60        perf-profile.calltrace.cycles-pp.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.5        1.51            +1.5        1.49        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap
      1.59            +1.5        3.12            +1.5        3.13        perf-profile.calltrace.cycles-pp.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64
      0.00            +1.6        1.62            +1.6        1.62        perf-profile.calltrace.cycles-pp.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.7        1.72            +1.7        1.73        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
      0.00            +2.0        2.01            +2.0        1.99        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
      5.34            +3.0        8.38            +3.0        8.37        perf-profile.calltrace.cycles-pp.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
     75.13            -1.9       73.22            -2.1       73.04        perf-profile.children.cycles-pp.move_vma
     37.01            -1.6       35.43            -1.6       35.38        perf-profile.children.cycles-pp.do_vmi_align_munmap
     25.06            -1.3       23.71            -1.4       23.65        perf-profile.children.cycles-pp.copy_vma
     20.00            -1.1       18.94            -1.1       18.91        perf-profile.children.cycles-pp.__split_vma
     19.86            -1.0       18.87            -1.0       18.89        perf-profile.children.cycles-pp.rcu_core
     19.84            -1.0       18.85            -1.0       18.87        perf-profile.children.cycles-pp.rcu_do_batch
     19.88            -1.0       18.89            -1.0       18.91        perf-profile.children.cycles-pp.handle_softirqs
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.6       10.06 ±  2%  perf-profile.children.cycles-pp.kthread
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.6       10.06 ±  2%  perf-profile.children.cycles-pp.ret_from_fork
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.6       10.06 ±  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
     10.64 ±  3%      -0.9        9.79 ±  3%      -0.6       10.02 ±  2%  perf-profile.children.cycles-pp.smpboot_thread_fn
     10.63 ±  3%      -0.9        9.78 ±  3%      -0.6       10.01 ±  2%  perf-profile.children.cycles-pp.run_ksoftirqd
     17.53            -0.8       16.70            -0.8       16.72        perf-profile.children.cycles-pp.kmem_cache_free
     15.28            -0.8       14.47            -0.8       14.48        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
     15.16            -0.8       14.37            -0.9       14.29        perf-profile.children.cycles-pp.vma_merge
     12.18            -0.6       11.54            -0.7       11.49        perf-profile.children.cycles-pp.mas_wr_store_entry
     11.98            -0.6       11.36            -0.7       11.30        perf-profile.children.cycles-pp.mas_store_prealloc
     12.11            -0.6       11.51            -0.6       11.51        perf-profile.children.cycles-pp.__slab_free
     10.86            -0.6       10.26            -0.6       10.30        perf-profile.children.cycles-pp.vm_area_dup
      9.89            -0.5        9.40            -0.6        9.33        perf-profile.children.cycles-pp.mas_wr_node_store
      8.36            -0.4        7.92            -0.4        7.91        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      7.98            -0.4        7.58            -0.4        7.55        perf-profile.children.cycles-pp.move_page_tables
      6.69            -0.4        6.33            -0.4        6.32        perf-profile.children.cycles-pp.vma_complete
      5.86            -0.3        5.56            -0.3        5.53        perf-profile.children.cycles-pp.move_ptes
      5.11            -0.3        4.81            -0.3        4.80        perf-profile.children.cycles-pp.mas_preallocate
      6.05            -0.3        5.75            -0.3        5.76        perf-profile.children.cycles-pp.vm_area_free_rcu_cb
      2.98 ±  2%      -0.3        2.73 ±  4%      -0.2        2.75 ±  2%  perf-profile.children.cycles-pp.__memcpy
      3.48            -0.2        3.26            -0.2        3.27        perf-profile.children.cycles-pp.___slab_alloc
      3.46 ±  2%      -0.2        3.26            -0.2        3.27 ±  2%  perf-profile.children.cycles-pp.mod_objcg_state
      2.91            -0.2        2.73            -0.2        2.73        perf-profile.children.cycles-pp.mas_alloc_nodes
      2.43            -0.2        2.25            -0.2        2.25        perf-profile.children.cycles-pp.find_vma_prev
      3.47            -0.2        3.29            -0.2        3.23 ±  2%  perf-profile.children.cycles-pp.down_write
      3.46            -0.2        3.28            -0.2        3.27        perf-profile.children.cycles-pp.flush_tlb_mm_range
      4.22            -0.2        4.06            -0.2        4.05        perf-profile.children.cycles-pp.anon_vma_clone
      3.32            -0.2        3.17            -0.1        3.18        perf-profile.children.cycles-pp.__memcg_slab_free_hook
      3.35            -0.2        3.20            -0.2        3.20        perf-profile.children.cycles-pp.mas_store_gfp
      2.22            -0.1        2.07            -0.1        2.07        perf-profile.children.cycles-pp.__cond_resched
      3.18            -0.1        3.04            -0.1        3.05        perf-profile.children.cycles-pp.unmap_vmas
      2.05 ±  2%      -0.1        1.91            -0.1        1.93 ±  2%  perf-profile.children.cycles-pp.allocate_slab
      2.24            -0.1        2.11 ±  2%      -0.2        2.08 ±  2%  perf-profile.children.cycles-pp.vma_prepare
      2.12            -0.1        2.00            -0.1        2.00        perf-profile.children.cycles-pp.__call_rcu_common
      2.66            -0.1        2.53            -0.1        2.54        perf-profile.children.cycles-pp.mtree_load
      2.46            -0.1        2.34            -0.1        2.34        perf-profile.children.cycles-pp.rcu_cblist_dequeue
      2.49            -0.1        2.38            -0.1        2.38        perf-profile.children.cycles-pp.flush_tlb_func
      8.32            -0.1        8.21            -0.1        8.22        perf-profile.children.cycles-pp.unmap_region
      2.48            -0.1        2.37            -0.1        2.37        perf-profile.children.cycles-pp.unmap_page_range
      2.23            -0.1        2.13            -0.1        2.13        perf-profile.children.cycles-pp.native_flush_tlb_one_user
      1.77            -0.1        1.67            -0.1        1.67        perf-profile.children.cycles-pp.mas_wr_walk
      1.88            -0.1        1.78            -0.1        1.78        perf-profile.children.cycles-pp.vma_link
      1.40            -0.1        1.31            -0.1        1.31        perf-profile.children.cycles-pp.shuffle_freelist
      1.84            -0.1        1.75            -0.1        1.76 ±  2%  perf-profile.children.cycles-pp.up_write
      0.97 ±  2%      -0.1        0.88            -0.1        0.88        perf-profile.children.cycles-pp.rcu_all_qs
      1.03            -0.1        0.95            -0.1        0.95        perf-profile.children.cycles-pp.mas_prev
      0.92            -0.1        0.85            -0.1        0.84        perf-profile.children.cycles-pp.mas_prev_setup
      1.58            -0.1        1.50            -0.1        1.50        perf-profile.children.cycles-pp.zap_pmd_range
      1.24            -0.1        1.17            -0.1        1.16        perf-profile.children.cycles-pp.mas_prev_slot
      1.58            -0.1        1.51            -0.1        1.51        perf-profile.children.cycles-pp.mas_update_gap
      0.62            -0.1        0.56            -0.1        0.56        perf-profile.children.cycles-pp.security_mmap_addr
      0.49 ±  2%      -0.1        0.43            -0.1        0.44 ±  2%  perf-profile.children.cycles-pp.setup_object
      0.90            -0.1        0.84            -0.1        0.85        perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.98            -0.1        0.92            -0.1        0.93        perf-profile.children.cycles-pp.mas_pop_node
      0.85            -0.1        0.80            -0.1        0.79        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.68            -0.1        1.62            -0.1        1.61        perf-profile.children.cycles-pp.__get_unmapped_area
      1.23            -0.1        1.18            -0.1        1.17        perf-profile.children.cycles-pp.__pte_offset_map_lock
      1.08            -0.1        1.03            -0.1        1.03        perf-profile.children.cycles-pp.zap_pte_range
      0.69 ±  2%      -0.0        0.64            -0.0        0.65        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.04            -0.0        1.00            -0.1        0.99        perf-profile.children.cycles-pp.vma_to_resize
      1.08            -0.0        1.04            -0.0        1.04        perf-profile.children.cycles-pp.mas_leaf_max_gap
      0.51 ±  3%      -0.0        0.47            -0.0        0.49 ±  4%  perf-profile.children.cycles-pp.anon_vma_interval_tree_insert
      1.18            -0.0        1.14            -0.1        1.13        perf-profile.children.cycles-pp.clear_bhb_loop
      0.57            -0.0        0.53            -0.0        0.53        perf-profile.children.cycles-pp.mas_wr_end_piv
      0.43            -0.0        0.40            -0.0        0.39        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      1.14            -0.0        1.10            -0.0        1.10        perf-profile.children.cycles-pp.mt_find
      0.46 ±  7%      -0.0        0.42 ±  2%      -0.0        0.42        perf-profile.children.cycles-pp._raw_spin_lock
      0.62            -0.0        0.58            -0.0        0.59        perf-profile.children.cycles-pp.__put_partials
      0.90            -0.0        0.87            -0.0        0.87        perf-profile.children.cycles-pp.userfaultfd_unmap_complete
      0.46 ±  3%      -0.0        0.42 ±  3%      -0.0        0.42 ±  3%  perf-profile.children.cycles-pp.__alloc_pages_noprof
      0.61            -0.0        0.58            -0.0        0.58        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.48            -0.0        0.45 ±  2%      -0.0        0.45        perf-profile.children.cycles-pp.mas_prev_range
      0.64            -0.0        0.61            -0.0        0.60        perf-profile.children.cycles-pp.get_old_pud
      0.31 ±  2%      -0.0        0.28 ±  3%      -0.0        0.28 ±  3%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.33 ±  3%      -0.0        0.30 ±  2%      -0.0        0.30 ±  3%  perf-profile.children.cycles-pp.mas_put_in_tree
      0.32 ±  2%      -0.0        0.29 ±  2%      -0.0        0.30 ±  2%  perf-profile.children.cycles-pp.tlb_finish_mmu
      0.47            -0.0        0.44 ±  2%      -0.0        0.44        perf-profile.children.cycles-pp.rcu_segcblist_enqueue
      0.33            -0.0        0.31            -0.0        0.32        perf-profile.children.cycles-pp.mas_destroy
      0.40            -0.0        0.39            -0.0        0.38        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.35            -0.0        0.34            -0.0        0.33        perf-profile.children.cycles-pp.__rb_insert_augmented
      0.25 ±  4%      -0.0        0.23 ±  3%      -0.0        0.23 ±  3%  perf-profile.children.cycles-pp.rmqueue
      0.39            -0.0        0.37            -0.0        0.37        perf-profile.children.cycles-pp.down_write_killable
      0.18 ±  3%      -0.0        0.17 ±  5%      -0.0        0.16 ±  5%  perf-profile.children.cycles-pp.cap_vm_enough_memory
      0.22 ±  4%      -0.0        0.20 ±  3%      -0.0        0.20 ±  3%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.21 ±  4%      -0.0        0.19 ±  3%      -0.0        0.19 ±  3%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.52            -0.0        0.51 ±  2%      -0.0        0.50        perf-profile.children.cycles-pp.__pte_offset_map
      0.26            -0.0        0.24 ±  2%      -0.0        0.24 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.30 ±  2%      -0.0        0.28 ±  2%      -0.0        0.28 ±  3%  perf-profile.children.cycles-pp.__vm_enough_memory
      0.29            -0.0        0.27            -0.0        0.27 ±  3%  perf-profile.children.cycles-pp.tlb_gather_mmu
      0.16 ±  2%      -0.0        0.14 ±  3%      -0.0        0.14 ±  4%  perf-profile.children.cycles-pp.mas_wr_append
      0.28 ±  2%      -0.0        0.26            -0.0        0.26        perf-profile.children.cycles-pp.khugepaged_enter_vma
      0.32            -0.0        0.30            -0.0        0.30        perf-profile.children.cycles-pp.mas_wr_store_setup
      0.20 ±  2%      -0.0        0.18 ±  2%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.__thp_vma_allowable_orders
      0.32            -0.0        0.30            -0.0        0.30 ±  2%  perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.09 ±  4%      -0.0        0.08 ±  5%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp.vma_dup_policy
      0.36            -0.0        0.35            -0.0        0.35        perf-profile.children.cycles-pp.madvise_vma_behavior
      0.16 ±  3%      -0.0        0.16 ±  2%      -0.0        0.15 ±  3%  perf-profile.children.cycles-pp._find_next_bit
      0.14 ±  3%      +0.0        0.15 ±  2%      +0.0        0.15        perf-profile.children.cycles-pp.free_pgd_range
      0.08 ±  4%      +0.0        0.10 ±  4%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.mm_get_unmapped_area_vmflags
      0.78            +0.1        0.85            +0.1        0.85        perf-profile.children.cycles-pp.__madvise
      0.63            +0.1        0.71            +0.1        0.71        perf-profile.children.cycles-pp.__x64_sys_madvise
      0.63            +0.1        0.70            +0.1        0.70        perf-profile.children.cycles-pp.do_madvise
      3.52            +0.1        3.60            +0.1        3.61        perf-profile.children.cycles-pp.free_pgtables
      0.00            +0.1        0.09            +0.1        0.09 ±  4%  perf-profile.children.cycles-pp.can_modify_mm_madv
      1.30            +0.2        1.46            +0.2        1.46        perf-profile.children.cycles-pp.mas_next_slot
     88.06            +0.8       88.84            +0.6       88.64        perf-profile.children.cycles-pp.mremap
     83.81            +1.0       84.84            +0.8       84.65        perf-profile.children.cycles-pp.__do_sys_mremap
     85.98            +1.0       87.02            +0.8       86.82        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     85.50            +1.1       86.56            +0.9       86.36        perf-profile.children.cycles-pp.do_syscall_64
      2.12            +1.5        3.62            +1.5        3.63        perf-profile.children.cycles-pp.do_munmap
     40.41            +1.5       41.93            +1.5       41.86        perf-profile.children.cycles-pp.do_vmi_munmap
      3.62            +2.4        5.98            +2.3        5.95        perf-profile.children.cycles-pp.mas_walk
      5.40            +3.0        8.44            +3.0        8.43        perf-profile.children.cycles-pp.mremap_to
      5.26            +3.2        8.48            +3.2        8.47        perf-profile.children.cycles-pp.mas_find
      0.00            +5.5        5.46            +5.4        5.45        perf-profile.children.cycles-pp.can_modify_mm
     11.49            -0.6       10.92            -0.6       10.93        perf-profile.self.cycles-pp.__slab_free
      4.32            -0.2        4.07            -0.3        3.98 ±  2%  perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
      1.96            -0.2        1.80 ±  4%      -0.1        1.83 ±  2%  perf-profile.self.cycles-pp.__memcpy
      2.36 ±  2%      -0.1        2.24 ±  2%      -0.2        2.19 ±  2%  perf-profile.self.cycles-pp.down_write
      2.42            -0.1        2.30            -0.1        2.32        perf-profile.self.cycles-pp.rcu_cblist_dequeue
      2.33            -0.1        2.22            -0.1        2.23        perf-profile.self.cycles-pp.mtree_load
      2.21            -0.1        2.10            -0.1        2.10        perf-profile.self.cycles-pp.native_flush_tlb_one_user
      1.62            -0.1        1.54            -0.1        1.55 ±  2%  perf-profile.self.cycles-pp.__memcg_slab_free_hook
      1.52            -0.1        1.44            -0.1        1.44        perf-profile.self.cycles-pp.mas_wr_walk
      1.15 ±  2%      -0.1        1.07            -0.1        1.08        perf-profile.self.cycles-pp.shuffle_freelist
      1.53            -0.1        1.45            -0.1        1.47 ±  2%  perf-profile.self.cycles-pp.up_write
      1.44            -0.1        1.36            -0.1        1.36        perf-profile.self.cycles-pp.__call_rcu_common
      0.70 ±  2%      -0.1        0.62            -0.1        0.63        perf-profile.self.cycles-pp.rcu_all_qs
      1.72            -0.1        1.66            -0.1        1.66        perf-profile.self.cycles-pp.mod_objcg_state
      3.77            -0.1        3.70 ±  4%      -0.2        3.62 ±  2%  perf-profile.self.cycles-pp.mas_wr_node_store
      0.51 ±  3%      -0.1        0.45            -0.1        0.45        perf-profile.self.cycles-pp.security_mmap_addr
      0.94 ±  2%      -0.1        0.88 ±  4%      -0.1        0.88 ±  2%  perf-profile.self.cycles-pp.vm_area_dup
      1.18            -0.1        1.12            -0.1        1.12        perf-profile.self.cycles-pp.vma_merge
      1.38            -0.1        1.33            -0.1        1.32        perf-profile.self.cycles-pp.do_vmi_align_munmap
      0.89            -0.1        0.83            -0.1        0.83        perf-profile.self.cycles-pp.___slab_alloc
      0.62            -0.1        0.56 ±  2%      -0.1        0.56 ±  2%  perf-profile.self.cycles-pp.mremap
      1.00            -0.1        0.95            -0.1        0.95        perf-profile.self.cycles-pp.mas_preallocate
      0.98            -0.1        0.93            -0.0        0.93        perf-profile.self.cycles-pp.move_ptes
      0.99            -0.1        0.94            -0.1        0.93        perf-profile.self.cycles-pp.mas_prev_slot
      1.09            -0.0        1.04 ±  2%      -0.0        1.05        perf-profile.self.cycles-pp.__cond_resched
      0.94            -0.0        0.90            -0.1        0.89        perf-profile.self.cycles-pp.vm_area_free_rcu_cb
      0.85            -0.0        0.80            -0.0        0.81        perf-profile.self.cycles-pp.mas_pop_node
      0.77            -0.0        0.72            -0.0        0.73        perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.68            -0.0        0.63            -0.0        0.64        perf-profile.self.cycles-pp.__split_vma
      1.17            -0.0        1.13            -0.1        1.12        perf-profile.self.cycles-pp.clear_bhb_loop
      0.95            -0.0        0.91            -0.0        0.91        perf-profile.self.cycles-pp.mas_leaf_max_gap
      0.79            -0.0        0.75            -0.0        0.76        perf-profile.self.cycles-pp.mas_wr_store_entry
      0.44            -0.0        0.40            -0.0        0.40        perf-profile.self.cycles-pp.do_munmap
      1.22            -0.0        1.18            -0.0        1.19        perf-profile.self.cycles-pp.move_vma
      0.89            -0.0        0.86            -0.0        0.86        perf-profile.self.cycles-pp.mas_store_gfp
      0.45            -0.0        0.42            -0.0        0.42        perf-profile.self.cycles-pp.mas_wr_end_piv
      0.43 ±  2%      -0.0        0.40            -0.0        0.39        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.78            -0.0        0.75            -0.0        0.76        perf-profile.self.cycles-pp.userfaultfd_unmap_complete
      0.66            -0.0        0.63            -0.0        0.63        perf-profile.self.cycles-pp.mas_store_prealloc
      1.49            -0.0        1.46            -0.0        1.45        perf-profile.self.cycles-pp.kmem_cache_free
      0.60            -0.0        0.58            -0.0        0.58        perf-profile.self.cycles-pp.unmap_region
      0.86            -0.0        0.83            -0.0        0.83        perf-profile.self.cycles-pp.move_page_tables
      0.43 ±  4%      -0.0        0.40            -0.0        0.42 ±  4%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
      0.99            -0.0        0.97            -0.0        0.97        perf-profile.self.cycles-pp.mt_find
      0.36 ±  3%      -0.0        0.33 ±  2%      -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.71            -0.0        0.68            -0.0        0.68        perf-profile.self.cycles-pp.unmap_page_range
      0.55            -0.0        0.52            -0.0        0.51        perf-profile.self.cycles-pp.get_old_pud
      0.49            -0.0        0.47            -0.0        0.47        perf-profile.self.cycles-pp.find_vma_prev
      0.27            -0.0        0.25            -0.0        0.25 ±  2%  perf-profile.self.cycles-pp.mas_prev_setup
      0.41            -0.0        0.39            -0.0        0.39        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.61            -0.0        0.58            -0.0        0.58        perf-profile.self.cycles-pp.copy_vma
      0.47            -0.0        0.45 ±  2%      -0.0        0.45        perf-profile.self.cycles-pp.flush_tlb_mm_range
      0.37 ±  6%      -0.0        0.35 ±  2%      -0.0        0.35        perf-profile.self.cycles-pp._raw_spin_lock
      0.42 ±  2%      -0.0        0.40 ±  2%      -0.0        0.40        perf-profile.self.cycles-pp.rcu_segcblist_enqueue
      0.27            -0.0        0.25 ±  2%      -0.0        0.25 ±  2%  perf-profile.self.cycles-pp.mas_put_in_tree
      0.39            -0.0        0.37            -0.0        0.37        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.44            -0.0        0.42            -0.0        0.42        perf-profile.self.cycles-pp.mas_update_gap
      0.49            -0.0        0.47            -0.0        0.48 ±  2%  perf-profile.self.cycles-pp.refill_obj_stock
      0.27 ±  2%      -0.0        0.25 ±  2%      -0.0        0.25        perf-profile.self.cycles-pp.tlb_finish_mmu
      0.34            -0.0        0.32            -0.0        0.32 ±  2%  perf-profile.self.cycles-pp.zap_pmd_range
      0.48            -0.0        0.46            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.28            -0.0        0.26            -0.0        0.26        perf-profile.self.cycles-pp.mas_alloc_nodes
      0.24 ±  2%      -0.0        0.22            -0.0        0.22        perf-profile.self.cycles-pp.mas_prev
      0.14 ±  3%      -0.0        0.12 ±  2%      -0.0        0.12 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.26            -0.0        0.24            -0.0        0.24        perf-profile.self.cycles-pp.__rb_insert_augmented
      0.40            -0.0        0.39            -0.0        0.39        perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.28            -0.0        0.26 ±  3%      -0.0        0.26        perf-profile.self.cycles-pp.mas_prev_range
      0.33 ±  2%      -0.0        0.32            -0.0        0.31        perf-profile.self.cycles-pp.zap_pte_range
      0.28            -0.0        0.26            -0.0        0.26        perf-profile.self.cycles-pp.flush_tlb_func
      0.44            -0.0        0.42 ±  2%      -0.0        0.42        perf-profile.self.cycles-pp.__pte_offset_map
      0.22            -0.0        0.21 ±  2%      -0.0        0.21 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.17            -0.0        0.16            -0.0        0.16        perf-profile.self.cycles-pp.__thp_vma_allowable_orders
      0.10            -0.0        0.09            -0.0        0.09 ±  3%  perf-profile.self.cycles-pp.mod_node_page_state
      0.06            -0.0        0.05            -0.0        0.05        perf-profile.self.cycles-pp.vma_dup_policy
      0.06 ±  5%      +0.0        0.07            +0.0        0.07 ±  4%  perf-profile.self.cycles-pp.mm_get_unmapped_area_vmflags
      0.11 ±  4%      +0.0        0.12 ±  4%      +0.0        0.12 ±  2%  perf-profile.self.cycles-pp.free_pgd_range
      0.21            +0.0        0.22 ±  2%      +0.0        0.22        perf-profile.self.cycles-pp.thp_get_unmapped_area_vmflags
      0.45            +0.0        0.48            +0.0        0.48        perf-profile.self.cycles-pp.do_vmi_munmap
      0.27            +0.0        0.32            +0.0        0.31        perf-profile.self.cycles-pp.free_pgtables
      0.36 ±  2%      +0.1        0.44            +0.1        0.45        perf-profile.self.cycles-pp.unlink_anon_vmas
      1.06            +0.1        1.19            +0.1        1.19        perf-profile.self.cycles-pp.mas_next_slot
      1.49            +0.5        2.01            +0.5        2.02        perf-profile.self.cycles-pp.mas_find
      0.00            +1.4        1.38            +1.4        1.38        perf-profile.self.cycles-pp.can_modify_mm
      3.15            +2.1        5.23            +2.1        5.22        perf-profile.self.cycles-pp.mas_walk

> 
>                 Linus

>  arch/powerpc/include/asm/mmu_context.h |  9 ---------
>  arch/powerpc/kernel/vdso.c             | 17 +++++++++++++++--
>  arch/x86/include/asm/mmu_context.h     |  5 -----
>  include/asm-generic/mm_hooks.h         | 11 +++--------
>  include/linux/mm_types.h               |  2 ++
>  mm/mmap.c                              | 15 ++++++---------
>  6 files changed, 26 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 37bffa0f7918..a334a1368848 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -260,15 +260,6 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
>  
>  extern void arch_exit_mmap(struct mm_struct *mm);
>  
> -static inline void arch_unmap(struct mm_struct *mm,
> -			      unsigned long start, unsigned long end)
> -{
> -	unsigned long vdso_base = (unsigned long)mm->context.vdso;
> -
> -	if (start <= vdso_base && vdso_base < end)
> -		mm->context.vdso = NULL;
> -}
> -
>  #ifdef CONFIG_PPC_MEM_KEYS
>  bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
>  			       bool execute, bool foreign);
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index 7a2ff9010f17..6fa041a6690a 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -81,6 +81,13 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
>  	return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
>  }
>  
> +static int vvar_close(const struct vm_special_mapping *sm,
> +		      struct vm_area_struct *vma)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	mm->context.vdso = NULL;
> +}
> +
>  static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
>  			     struct vm_area_struct *vma, struct vm_fault *vmf);
>  
> @@ -92,11 +99,13 @@ static struct vm_special_mapping vvar_spec __ro_after_init = {
>  static struct vm_special_mapping vdso32_spec __ro_after_init = {
>  	.name = "[vdso]",
>  	.mremap = vdso32_mremap,
> +	.close = vvar_close,
>  };
>  
>  static struct vm_special_mapping vdso64_spec __ro_after_init = {
>  	.name = "[vdso]",
>  	.mremap = vdso64_mremap,
> +	.close = vvar_close,
>  };
>  
>  #ifdef CONFIG_TIME_NS
> @@ -207,8 +216,10 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
>  	vma = _install_special_mapping(mm, vdso_base, vvar_size,
>  				       VM_READ | VM_MAYREAD | VM_IO |
>  				       VM_DONTDUMP | VM_PFNMAP, &vvar_spec);
> -	if (IS_ERR(vma))
> +	if (IS_ERR(vma)) {
> +		mm->context.vdso = NULL;
>  		return PTR_ERR(vma);
> +	}
>  
>  	/*
>  	 * our vma flags don't have VM_WRITE so by default, the process isn't
> @@ -223,8 +234,10 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
>  	vma = _install_special_mapping(mm, vdso_base + vvar_size, vdso_size,
>  				       VM_READ | VM_EXEC | VM_MAYREAD |
>  				       VM_MAYWRITE | VM_MAYEXEC, vdso_spec);
> -	if (IS_ERR(vma))
> +	if (IS_ERR(vma)) {
> +		mm->context.vdso = NULL;
>  		do_munmap(mm, vdso_base, vvar_size, NULL);
> +	}
>  
>  	return PTR_ERR_OR_ZERO(vma);
>  }
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 8dac45a2c7fc..80f2a3187aa6 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -232,11 +232,6 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
>  }
>  #endif
>  
> -static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
> -			      unsigned long end)
> -{
> -}
> -
>  /*
>   * We only want to enforce protection keys on the current process
>   * because we effectively have no access to PKRU for other
> diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h
> index 4dbb177d1150..6eea3b3c1e65 100644
> --- a/include/asm-generic/mm_hooks.h
> +++ b/include/asm-generic/mm_hooks.h
> @@ -1,8 +1,8 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  /*
> - * Define generic no-op hooks for arch_dup_mmap, arch_exit_mmap
> - * and arch_unmap to be included in asm-FOO/mmu_context.h for any
> - * arch FOO which doesn't need to hook these.
> + * Define generic no-op hooks for arch_dup_mmap and arch_exit_mmap
> + * to be included in asm-FOO/mmu_context.h for any arch FOO which
> + * doesn't need to hook these.
>   */
>  #ifndef _ASM_GENERIC_MM_HOOKS_H
>  #define _ASM_GENERIC_MM_HOOKS_H
> @@ -17,11 +17,6 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
>  {
>  }
>  
> -static inline void arch_unmap(struct mm_struct *mm,
> -			unsigned long start, unsigned long end)
> -{
> -}
> -
>  static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
>  		bool write, bool execute, bool foreign)
>  {
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 485424979254..ef32d87a3adc 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1313,6 +1313,8 @@ struct vm_special_mapping {
>  
>  	int (*mremap)(const struct vm_special_mapping *sm,
>  		     struct vm_area_struct *new_vma);
> +	void (*close)(const struct vm_special_mapping *sm,
> +		      struct vm_area_struct *vma);
>  };
>  
>  enum tlb_flush_reason {
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d0dfc85b209b..adaaf1ef197a 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2789,7 +2789,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
>   *
>   * This function takes a @mas that is either pointing to the previous VMA or set
>   * to MA_START and sets it up to remove the mapping(s).  The @len will be
> - * aligned and any arch_unmap work will be preformed.
> + * aligned.
>   *
>   * Return: 0 on success and drops the lock if so directed, error and leaves the
>   * lock held otherwise.
> @@ -2809,16 +2809,12 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
>  		return -EINVAL;
>  
>  	/*
> -	 * Check if memory is sealed before arch_unmap.
> -	 * Prevent unmapping a sealed VMA.
> +	 * Check if memory is sealed, prevent unmapping a sealed VMA.
>  	 * can_modify_mm assumes we have acquired the lock on MM.
>  	 */
>  	if (unlikely(!can_modify_mm(mm, start, end)))
>  		return -EPERM;
>  
> -	 /* arch_unmap() might do unmaps itself.  */
> -	arch_unmap(mm, start, end);
> -
>  	/* Find the first overlapping VMA */
>  	vma = vma_find(vmi, end);
>  	if (!vma) {
> @@ -3232,14 +3228,12 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  	struct mm_struct *mm = vma->vm_mm;
>  
>  	/*
> -	 * Check if memory is sealed before arch_unmap.
> -	 * Prevent unmapping a sealed VMA.
> +	 * Check if memory is sealed, prevent unmapping a sealed VMA.
>  	 * can_modify_mm assumes we have acquired the lock on MM.
>  	 */
>  	if (unlikely(!can_modify_mm(mm, start, end)))
>  		return -EPERM;
>  
> -	arch_unmap(mm, start, end);
>  	return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
>  }
>  
> @@ -3624,6 +3618,9 @@ static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
>   */
>  static void special_mapping_close(struct vm_area_struct *vma)
>  {
> +	const struct vm_special_mapping *sm = vma->vm_private_data;
> +	if (sm->close)
> +		sm->close(sm, vma);
>  }
>  
>  static const char *special_mapping_name(struct vm_area_struct *vma)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  2:17             ` Linus Torvalds
@ 2024-08-06 12:03               ` Michael Ellerman
  2024-08-06 14:43                 ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Michael Ellerman @ 2024-08-06 12:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Xu, Nicholas Piggin, Christophe Leroy, Pedro Falcato,
	kernel test robot, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Mon, 5 Aug 2024 at 19:14, Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Needs a slight tweak to compile, vvar_close() needs to return void.
>
> Ack, shows just how untested it was.
>
>> And should probably be renamed vdso_close().
>
> .. and that was due to the initial confusion that I then fixed, but
> didn't fix the naming.

Ack.

> So yes, those fixes look ObviouslyCorrect(tm).

Needs another slight tweak to work correctly. Diff below.

With that our sigreturn_vdso selftest passes, and the CRIU vdso tests
pass also. So LGTM.

I'm not sure of the urgency on this, do you want to apply it directly?
If so feel free to add my tested-by/sob etc.

Or should I turn it into a series and post it?

cheers


diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 431b46976db8..ed5ac4af4d83 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -85,6 +85,15 @@ static void vdso_close(const struct vm_special_mapping *sm,
                        struct vm_area_struct *vma)
 {
 	struct mm_struct *mm = vma->vm_mm;
+
+	/*
+	 * close() is called for munmap() but also for mremap(). In the mremap()
+	 * case the vdso pointer has already been updated by the mremap() hook
+	 * above, so it must not be set to NULL here. 
+	 */
+	if (vma->vm_start != (unsigned long)mm->context.vdso)
+		return;
+
 	mm->context.vdso = NULL;
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  6:04           ` Oliver Sang
@ 2024-08-06 14:38             ` Linus Torvalds
  2024-08-06 21:37             ` Pedro Falcato
  1 sibling, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2024-08-06 14:38 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Jeff Xu, Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Pedro Falcato, Jeff Xu, oe-lkp, lkp, linux-kernel, Andrew Morton,
	Kees Cook, Liam R. Howlett, Dave Hansen, Greg Kroah-Hartman,
	Guenter Roeck, Jann Horn, Jonathan Corbet, Jorge Lucangeli Obes,
	Matthew Wilcox, Muhammad Usama Anjum, Stephen Röttger,
	Suren Baghdasaryan, Amer Al Shanawany, Javier Carrasco,
	Shuah Khan, linux-api, linux-mm, ying.huang, feng.tang,
	fengwei.yin

On Mon, 5 Aug 2024 at 23:05, Oliver Sang <oliver.sang@intel.com> wrote:
>
> > New version - still untested, but now I've read through it one more
> > time - attached.
>
> we tested this version by applying it directly upon 8be7258aad,  but seems it
> have little impact to performance. still similar regression if comparing to
> ff388fe5c4.

Note that that patch (and Michael's fixes for ppc on top) in itself
doesn't fix any performance issue.

But getting rid of arch_unmap() means that now the can_modify_mm() in
do_vmi_munmap() is right above the "vma_find()" (and can in fact be
moved below it and into do_vmi_align_munmap), and that means that at
least the unmap paths don't need the vma lookup of can_modify_mm() at
all, because they've done their own.

IOW, the "arch_unmap()" removal was purely preparatory and did nothing
on its own, it's only preparatory to get rid of some of the
can_modify_mm() costs.

The call to can_modify_mm() in mremap_to() is a bit harder to get rid
of. Unless we just say "mremap will unmap the destination even if the
mremap source is sealed".

            Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06 12:03               ` Michael Ellerman
@ 2024-08-06 14:43                 ` Linus Torvalds
  2024-08-07 12:26                   ` Michael Ellerman
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2024-08-06 14:43 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Jeff Xu, Nicholas Piggin, Christophe Leroy, Pedro Falcato,
	kernel test robot, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

On Tue, 6 Aug 2024 at 05:03, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Or should I turn it into a series and post it?

I think post it as a single working patch rather than as a series that
breaks things and then fixes it.

And considering that you did all the testing and found the problems,
just take ownership of it and make it a "Suggested-by: Linus" or
something.

That's what my original patch was anyway: "something like this".

            Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  1:44   ` Oliver Sang
@ 2024-08-06 14:54     ` Jeff Xu
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Xu @ 2024-08-06 14:54 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Jeff Xu, oe-lkp, lkp, linux-kernel, Andrew Morton, Kees Cook,
	Liam R. Howlett, Pedro Falcato, Dave Hansen, Greg Kroah-Hartman,
	Guenter Roeck, Jann Horn, Jonathan Corbet, Jorge Lucangeli Obes,
	Linus Torvalds, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

On Mon, Aug 5, 2024 at 6:44 PM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Jeff,
>
> On Mon, Aug 05, 2024 at 09:58:33AM -0700, Jeff Xu wrote:
> > On Sun, Aug 4, 2024 at 1:59 AM kernel test robot <oliver.sang@intel.com> wrote:
> > >
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on:
> > >
> > >
> > > commit: 8be7258aad44b5e25977a98db136f677fa6f4370 ("mseal: add mseal syscall")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > testcase: stress-ng
> > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > > parameters:
> > >
> > >         nr_threads: 100%
> > >         testtime: 60s
> > >         test: pagemove
> > >         cpufreq_governor: performance
> > >
> > >
> > > In addition to that, the commit also has significant impact on the following tests:
> > >
> > > +------------------+---------------------------------------------------------------------------------------------+
> > > | testcase: change | stress-ng: stress-ng.pkey.ops_per_sec -3.6% regression                                      |
> > > | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
> > > | test parameters  | cpufreq_governor=performance                                                                |
> > > |                  | nr_threads=100%                                                                             |
> > > |                  | test=pkey                                                                                   |
> > > |                  | testtime=60s                                                                                |
> > > +------------------+---------------------------------------------------------------------------------------------+
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202408041602.caa0372-oliver.sang@intel.com
> > >
> > >
> > > Details are as below:
> > > -------------------------------------------------------------------------------------------------->
> > >
> > >
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20240804/202408041602.caa0372-oliver.sang@intel.com
> > >
> > There is an error when I try to reproduce the test:
>
> what's your os? we support some distributions
> https://github.com/intel/lkp-tests?tab=readme-ov-file#supported-distributions
>
> >
> > bin/lkp install job.yaml
> >
> > --------------------------------------------------------
> > Some packages could not be installed. This may mean that you have
> > requested an impossible situation or if you are using the unstable
> > distribution that some required packages have not yet been created
> > or been moved out of Incoming.
> > The following information may help to resolve the situation:
> >
> > The following packages have unmet dependencies:
> >  libdw1 : Depends: libelf1 (= 0.190-1+b1)
> >  libdw1t64 : Breaks: libdw1 (< 0.191-2)
> > E: Unable to correct problems, you have held broken packages.
> > Cannot install some packages of perf-c2c depends
> > -----------------------------------------------------------------------------------------
> >
> > And where is stress-ng.pagemove.page_remaps_per_sec test implemented,
> > is that part of lkp-tests ?
>
> stress-ng is in https://github.com/ColinIanKing/stress-ng
>
I will try this route first.

Thanks
-Jeff

> >
> > Thanks
> > -Jeff
> >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pagemove/stress-ng/60s
> > >
> > > commit:
> > >   ff388fe5c4 ("mseal: wire up mseal syscall")
> > >   8be7258aad ("mseal: add mseal syscall")
> > >
> > > ff388fe5c481d39c 8be7258aad44b5e25977a98db13
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >   41625945            -4.3%   39842322        proc-vmstat.numa_hit
> > >   41559175            -4.3%   39774160        proc-vmstat.numa_local
> > >   77484314            -4.4%   74105555        proc-vmstat.pgalloc_normal
> > >   77205752            -4.4%   73826672        proc-vmstat.pgfree
> > >   18361466            -4.2%   17596652        stress-ng.pagemove.ops
> > >     306014            -4.2%     293262        stress-ng.pagemove.ops_per_sec
> > >     205312            -4.4%     196176        stress-ng.pagemove.page_remaps_per_sec
> > >       4961            +1.0%       5013        stress-ng.time.percent_of_cpu_this_job_got
> > >       2917            +1.2%       2952        stress-ng.time.system_time
> > >       1.07            -6.6%       1.00        perf-stat.i.MPKI
> > >  3.354e+10            +3.5%  3.473e+10        perf-stat.i.branch-instructions
> > >  1.795e+08            -4.2%  1.719e+08        perf-stat.i.cache-misses
> > >  2.376e+08            -4.1%  2.279e+08        perf-stat.i.cache-references
> > >       1.13            -3.0%       1.10        perf-stat.i.cpi
> > >       1077            +4.3%       1124        perf-stat.i.cycles-between-cache-misses
> > >  1.717e+11            +2.7%  1.762e+11        perf-stat.i.instructions
> > >       0.88            +3.1%       0.91        perf-stat.i.ipc
> > >       1.05            -6.8%       0.97        perf-stat.overall.MPKI
> > >       0.25 ą  2%      -0.0        0.24        perf-stat.overall.branch-miss-rate%
> > >       1.13            -3.0%       1.10        perf-stat.overall.cpi
> > >       1084            +4.0%       1127        perf-stat.overall.cycles-between-cache-misses
> > >       0.88            +3.1%       0.91        perf-stat.overall.ipc
> > >  3.298e+10            +3.5%  3.415e+10        perf-stat.ps.branch-instructions
> > >  1.764e+08            -4.3%  1.689e+08        perf-stat.ps.cache-misses
> > >  2.336e+08            -4.1%   2.24e+08        perf-stat.ps.cache-references
> > >     194.57            -2.4%     189.96 ą  2%  perf-stat.ps.cpu-migrations
> > >  1.688e+11            +2.7%  1.733e+11        perf-stat.ps.instructions
> > >  1.036e+13            +3.0%  1.068e+13        perf-stat.total.instructions
> > >      75.12            -1.9       73.22        perf-profile.calltrace.cycles-pp.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> > >      36.84            -1.6       35.29        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
> > >      24.90            -1.2       23.72        perf-profile.calltrace.cycles-pp.copy_vma.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >      19.89            -0.9       18.98        perf-profile.calltrace.cycles-pp.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> > >      10.56 ą  2%      -0.8        9.78 ą  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
> > >      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
> > >      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> > >      10.57 ą  2%      -0.8        9.80 ą  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> > >      10.52 ą  2%      -0.8        9.75 ą  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
> > >      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> > >      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> > >      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> > >      14.75            -0.7       14.07        perf-profile.calltrace.cycles-pp.vma_merge.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> > >       1.50            -0.6        0.94        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
> > >       5.88 ą  2%      -0.4        5.47 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
> > >       7.80            -0.3        7.47        perf-profile.calltrace.cycles-pp.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       4.55 ą  2%      -0.3        4.24 ą  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
> > >       6.76            -0.3        6.45        perf-profile.calltrace.cycles-pp.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
> > >       6.15            -0.3        5.86        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap
> > >       8.22            -0.3        7.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> > >       6.12            -0.3        5.87        perf-profile.calltrace.cycles-pp.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
> > >       5.74            -0.2        5.50        perf-profile.calltrace.cycles-pp.move_ptes.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
> > >       3.16 ą  2%      -0.2        2.94        perf-profile.calltrace.cycles-pp.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
> > >       5.50            -0.2        5.28        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap
> > >       1.36            -0.2        1.14        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
> > >       5.15            -0.2        4.94        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap
> > >       5.51            -0.2        5.31        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_merge.copy_vma.move_vma.__do_sys_mremap
> > >       5.16            -0.2        4.97        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma.move_vma
> > >       2.24            -0.2        2.05        perf-profile.calltrace.cycles-pp.find_vma_prev.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> > >       2.60 ą  2%      -0.2        2.42 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs
> > >       4.67            -0.2        4.49        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma
> > >       3.41            -0.2        3.23        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
> > >       3.00            -0.2        2.83 ą  2%  perf-profile.calltrace.cycles-pp.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
> > >       0.96            -0.2        0.80        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.do_munmap.mremap_to
> > >       4.04            -0.2        3.88        perf-profile.calltrace.cycles-pp.vm_area_dup.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> > >       3.20 ą  2%      -0.2        3.04 ą  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
> > >       3.53            -0.1        3.38        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma.__do_sys_mremap
> > >       3.40            -0.1        3.26        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma.__do_sys_mremap
> > >       2.20 ą  2%      -0.1        2.06 ą  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap
> > >       1.84 ą  3%      -0.1        1.71 ą  3%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap
> > >       1.78 ą  2%      -0.1        1.65 ą  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_merge.copy_vma.move_vma.__do_sys_mremap
> > >       2.69            -0.1        2.56        perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap
> > >       1.78 ą  2%      -0.1        1.66 ą  2%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
> > >       1.36 ą  2%      -0.1        1.23 ą  2%  perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
> > >       0.95            -0.1        0.83        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap
> > >       3.29            -0.1        3.17        perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> > >       2.08            -0.1        1.96        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_merge.copy_vma.move_vma.__do_sys_mremap
> > >       1.43 ą  3%      -0.1        1.32 ą  3%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_merge.copy_vma.move_vma
> > >       2.21            -0.1        2.10        perf-profile.calltrace.cycles-pp.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
> > >       2.47            -0.1        2.36        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma
> > >       2.21            -0.1        2.12        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables
> > >       1.41            -0.1        1.32        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap
> > >       1.26            -0.1        1.18        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap
> > >       1.82            -0.1        1.75        perf-profile.calltrace.cycles-pp.vma_link.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> > >       0.71            -0.1        0.63        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge.copy_vma.move_vma.__do_sys_mremap
> > >       1.29            -0.1        1.22        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
> > >       0.61            -0.1        0.54        perf-profile.calltrace.cycles-pp.mas_prev_setup.mas_prev.vma_merge.copy_vma.move_vma
> > >       1.36            -0.1        1.29        perf-profile.calltrace.cycles-pp.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region.do_vmi_align_munmap
> > >       1.40            -0.1        1.33        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma.move_vma
> > >       0.70            -0.1        0.64        perf-profile.calltrace.cycles-pp.mas_walk.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
> > >       1.23            -0.1        1.17        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma
> > >       1.66            -0.1        1.60        perf-profile.calltrace.cycles-pp.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       1.16            -0.1        1.10        perf-profile.calltrace.cycles-pp.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
> > >       0.96            -0.1        0.90        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unlink_anon_vmas.free_pgtables.unmap_region
> > >       1.14            -0.1        1.08        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
> > >       0.79            -0.1        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
> > >       1.04            -0.1        1.00        perf-profile.calltrace.cycles-pp.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       0.58            -0.0        0.53        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.mremap
> > >       0.61            -0.0        0.56        perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
> > >       0.56            -0.0        0.52        perf-profile.calltrace.cycles-pp.mas_find.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
> > >       0.57            -0.0        0.53 ą  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
> > >       0.78            -0.0        0.74        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
> > >       0.88            -0.0        0.84        perf-profile.calltrace.cycles-pp.mtree_load.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64
> > >       0.70            -0.0        0.66        perf-profile.calltrace.cycles-pp.__call_rcu_common.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> > >       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> > >       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.mremap
> > >       0.97            -0.0        0.93        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> > >       1.11            -0.0        1.08        perf-profile.calltrace.cycles-pp.clear_bhb_loop.mremap
> > >       0.75            -0.0        0.72        perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
> > >       0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap
> > >       0.60 ą  2%      -0.0        0.57        perf-profile.calltrace.cycles-pp.security_mmap_addr.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64
> > >       0.67 ą  2%      -0.0        0.64        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma
> > >       0.82            -0.0        0.79        perf-profile.calltrace.cycles-pp.mtree_load.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> > >       0.63            -0.0        0.60        perf-profile.calltrace.cycles-pp.mas_prev_slot.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
> > >       0.99            -0.0        0.96        perf-profile.calltrace.cycles-pp.mt_find.vma_merge.copy_vma.move_vma.__do_sys_mremap
> > >       0.62 ą  2%      -0.0        0.59        perf-profile.calltrace.cycles-pp.get_old_pud.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
> > >       0.87            -0.0        0.84        perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> > >       0.78            -0.0        0.75        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_link.copy_vma.move_vma.__do_sys_mremap
> > >       0.64            -0.0        0.62        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_link.copy_vma.move_vma.__do_sys_mremap
> > >       0.90            -0.0        0.87        perf-profile.calltrace.cycles-pp.mtree_load.vma_merge.copy_vma.move_vma.__do_sys_mremap
> > >       0.54            -0.0        0.52        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.move_ptes.move_page_tables.move_vma.__do_sys_mremap
> > >       1.04            +0.0        1.08        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
> > >       0.76            +0.1        0.83        perf-profile.calltrace.cycles-pp.__madvise
> > >       0.63            +0.1        0.70        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> > >       0.62            +0.1        0.70        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> > >       0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
> > >       0.66            +0.1        0.74        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> > >      87.74            +0.7       88.45        perf-profile.calltrace.cycles-pp.mremap
> > >       0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.do_munmap
> > >       0.00            +0.9        0.86        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.mremap_to.__do_sys_mremap
> > >      84.88            +0.9       85.77        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mremap
> > >      84.73            +0.9       85.62        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> > >       0.00            +0.9        0.92 ą  2%  perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.move_vma
> > >      83.84            +0.9       84.78        perf-profile.calltrace.cycles-pp.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> > >       0.00            +1.1        1.06        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64
> > >       0.00            +1.2        1.21        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to
> > >       2.07            +1.5        3.55        perf-profile.calltrace.cycles-pp.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       1.58            +1.5        3.07        perf-profile.calltrace.cycles-pp.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64
> > >       0.00            +1.5        1.52        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap
> > >       0.00            +1.6        1.57        perf-profile.calltrace.cycles-pp.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       0.00            +1.7        1.72        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
> > >       0.00            +2.0        2.01        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
> > >       5.39            +2.9        8.32        perf-profile.calltrace.cycles-pp.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
> > >      75.29            -1.9       73.37        perf-profile.children.cycles-pp.move_vma
> > >      37.06            -1.6       35.50        perf-profile.children.cycles-pp.do_vmi_align_munmap
> > >      24.98            -1.2       23.80        perf-profile.children.cycles-pp.copy_vma
> > >      19.99            -1.0       19.02        perf-profile.children.cycles-pp.handle_softirqs
> > >      19.97            -1.0       19.00        perf-profile.children.cycles-pp.rcu_core
> > >      19.95            -1.0       18.98        perf-profile.children.cycles-pp.rcu_do_batch
> > >      19.98            -0.9       19.06        perf-profile.children.cycles-pp.__split_vma
> > >      17.55            -0.8       16.76        perf-profile.children.cycles-pp.kmem_cache_free
> > >      10.56 ą  2%      -0.8        9.79 ą  2%  perf-profile.children.cycles-pp.run_ksoftirqd
> > >      10.57 ą  2%      -0.8        9.80 ą  2%  perf-profile.children.cycles-pp.smpboot_thread_fn
> > >      15.38            -0.8       14.62        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
> > >      10.62 ą  2%      -0.8        9.85 ą  2%  perf-profile.children.cycles-pp.kthread
> > >      10.62 ą  2%      -0.8        9.86 ą  2%  perf-profile.children.cycles-pp.ret_from_fork
> > >      10.62 ą  2%      -0.8        9.86 ą  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
> > >      15.14            -0.7       14.44        perf-profile.children.cycles-pp.vma_merge
> > >      12.08            -0.5       11.55        perf-profile.children.cycles-pp.__slab_free
> > >      12.11            -0.5       11.62        perf-profile.children.cycles-pp.mas_wr_store_entry
> > >      10.86            -0.5       10.39        perf-profile.children.cycles-pp.vm_area_dup
> > >      11.89            -0.5       11.44        perf-profile.children.cycles-pp.mas_store_prealloc
> > >       8.49            -0.4        8.06        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
> > >       9.88            -0.4        9.49        perf-profile.children.cycles-pp.mas_wr_node_store
> > >       7.91            -0.3        7.58        perf-profile.children.cycles-pp.move_page_tables
> > >       6.06            -0.3        5.78        perf-profile.children.cycles-pp.vm_area_free_rcu_cb
> > >       8.28            -0.3        8.00        perf-profile.children.cycles-pp.unmap_region
> > >       6.69            -0.3        6.42        perf-profile.children.cycles-pp.vma_complete
> > >       5.06            -0.3        4.80        perf-profile.children.cycles-pp.mas_preallocate
> > >       5.82            -0.2        5.57        perf-profile.children.cycles-pp.move_ptes
> > >       4.24            -0.2        4.01        perf-profile.children.cycles-pp.anon_vma_clone
> > >       3.50            -0.2        3.30        perf-profile.children.cycles-pp.down_write
> > >       2.44            -0.2        2.25        perf-profile.children.cycles-pp.find_vma_prev
> > >       3.46            -0.2        3.28        perf-profile.children.cycles-pp.___slab_alloc
> > >       3.45            -0.2        3.27        perf-profile.children.cycles-pp.free_pgtables
> > >       2.54            -0.2        2.37        perf-profile.children.cycles-pp.rcu_cblist_dequeue
> > >       3.35            -0.2        3.18        perf-profile.children.cycles-pp.__memcg_slab_free_hook
> > >       2.93            -0.2        2.78        perf-profile.children.cycles-pp.mas_alloc_nodes
> > >       2.28 ą  2%      -0.2        2.12 ą  2%  perf-profile.children.cycles-pp.vma_prepare
> > >       3.46            -0.1        3.32        perf-profile.children.cycles-pp.flush_tlb_mm_range
> > >       3.41            -0.1        3.27 ą  2%  perf-profile.children.cycles-pp.mod_objcg_state
> > >       2.76            -0.1        2.63        perf-profile.children.cycles-pp.unlink_anon_vmas
> > >       3.41            -0.1        3.28        perf-profile.children.cycles-pp.mas_store_gfp
> > >       2.21            -0.1        2.09        perf-profile.children.cycles-pp.__cond_resched
> > >       2.04            -0.1        1.94        perf-profile.children.cycles-pp.allocate_slab
> > >       2.10            -0.1        2.00        perf-profile.children.cycles-pp.__call_rcu_common
> > >       2.51            -0.1        2.40        perf-profile.children.cycles-pp.flush_tlb_func
> > >       1.04            -0.1        0.94        perf-profile.children.cycles-pp.mas_prev
> > >       2.71            -0.1        2.61        perf-profile.children.cycles-pp.mtree_load
> > >       2.23            -0.1        2.14        perf-profile.children.cycles-pp.native_flush_tlb_one_user
> > >       0.22 ą  5%      -0.1        0.13 ą 13%  perf-profile.children.cycles-pp.vm_stat_account
> > >       0.95            -0.1        0.87        perf-profile.children.cycles-pp.mas_prev_setup
> > >       1.65            -0.1        1.57        perf-profile.children.cycles-pp.mas_wr_walk
> > >       1.84            -0.1        1.76        perf-profile.children.cycles-pp.up_write
> > >       1.27            -0.1        1.20        perf-profile.children.cycles-pp.mas_prev_slot
> > >       1.84            -0.1        1.77        perf-profile.children.cycles-pp.vma_link
> > >       1.39            -0.1        1.32        perf-profile.children.cycles-pp.shuffle_freelist
> > >       0.96            -0.1        0.90 ą  2%  perf-profile.children.cycles-pp.rcu_all_qs
> > >       0.86            -0.1        0.80        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> > >       1.70            -0.1        1.64        perf-profile.children.cycles-pp.__get_unmapped_area
> > >       0.34 ą  3%      -0.1        0.29 ą  5%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
> > >       0.60            -0.0        0.55        perf-profile.children.cycles-pp.entry_SYSCALL_64
> > >       0.92            -0.0        0.87        perf-profile.children.cycles-pp.percpu_counter_add_batch
> > >       1.07            -0.0        1.02        perf-profile.children.cycles-pp.vma_to_resize
> > >       1.59            -0.0        1.54        perf-profile.children.cycles-pp.mas_update_gap
> > >       0.44 ą  2%      -0.0        0.40 ą  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> > >       0.70            -0.0        0.66        perf-profile.children.cycles-pp.syscall_return_via_sysret
> > >       1.13            -0.0        1.09        perf-profile.children.cycles-pp.mt_find
> > >       0.20 ą  6%      -0.0        0.17 ą  9%  perf-profile.children.cycles-pp.cap_vm_enough_memory
> > >       0.99            -0.0        0.95        perf-profile.children.cycles-pp.mas_pop_node
> > >       0.63 ą  2%      -0.0        0.59        perf-profile.children.cycles-pp.security_mmap_addr
> > >       0.62            -0.0        0.59        perf-profile.children.cycles-pp.__put_partials
> > >       1.17            -0.0        1.14        perf-profile.children.cycles-pp.clear_bhb_loop
> > >       0.46            -0.0        0.43 ą  2%  perf-profile.children.cycles-pp.__alloc_pages_noprof
> > >       0.44            -0.0        0.41 ą  2%  perf-profile.children.cycles-pp.get_page_from_freelist
> > >       0.90            -0.0        0.87        perf-profile.children.cycles-pp.userfaultfd_unmap_complete
> > >       0.64 ą  2%      -0.0        0.62        perf-profile.children.cycles-pp.get_old_pud
> > >       1.07            -0.0        1.05        perf-profile.children.cycles-pp.mas_leaf_max_gap
> > >       0.22 ą  3%      -0.0        0.20 ą  2%  perf-profile.children.cycles-pp.__rmqueue_pcplist
> > >       0.55            -0.0        0.53        perf-profile.children.cycles-pp.refill_obj_stock
> > >       0.25            -0.0        0.23 ą  3%  perf-profile.children.cycles-pp.rmqueue
> > >       0.48            -0.0        0.45        perf-profile.children.cycles-pp.mremap_userfaultfd_prep
> > >       0.33            -0.0        0.30        perf-profile.children.cycles-pp.free_unref_page
> > >       0.46            -0.0        0.44        perf-profile.children.cycles-pp.setup_object
> > >       0.21 ą  3%      -0.0        0.19 ą  2%  perf-profile.children.cycles-pp.rmqueue_bulk
> > >       0.31 ą  3%      -0.0        0.29        perf-profile.children.cycles-pp.__vm_enough_memory
> > >       0.40            -0.0        0.38        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > >       0.36            -0.0        0.35        perf-profile.children.cycles-pp.madvise_vma_behavior
> > >       0.54            -0.0        0.53 ą  2%  perf-profile.children.cycles-pp.mas_wr_end_piv
> > >       0.46            -0.0        0.44 ą  2%  perf-profile.children.cycles-pp.rcu_segcblist_enqueue
> > >       0.34            -0.0        0.32 ą  2%  perf-profile.children.cycles-pp.mas_destroy
> > >       0.28            -0.0        0.26 ą  3%  perf-profile.children.cycles-pp.mas_wr_store_setup
> > >       0.30            -0.0        0.28        perf-profile.children.cycles-pp.pte_offset_map_nolock
> > >       0.19            -0.0        0.18 ą  2%  perf-profile.children.cycles-pp.__thp_vma_allowable_orders
> > >       0.08 ą  4%      -0.0        0.07        perf-profile.children.cycles-pp.ksm_madvise
> > >       0.17            -0.0        0.16        perf-profile.children.cycles-pp.get_any_partial
> > >       0.08            -0.0        0.07        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> > >       0.45            +0.0        0.47        perf-profile.children.cycles-pp._raw_spin_lock
> > >       1.10            +0.0        1.14        perf-profile.children.cycles-pp.zap_pte_range
> > >       0.78            +0.1        0.85        perf-profile.children.cycles-pp.__madvise
> > >       0.63            +0.1        0.70        perf-profile.children.cycles-pp.__x64_sys_madvise
> > >       0.62            +0.1        0.70        perf-profile.children.cycles-pp.do_madvise
> > >       0.00            +0.1        0.09 ą  4%  perf-profile.children.cycles-pp.can_modify_mm_madv
> > >       1.32            +0.1        1.46        perf-profile.children.cycles-pp.mas_next_slot
> > >      88.13            +0.7       88.83        perf-profile.children.cycles-pp.mremap
> > >      83.94            +0.9       84.88        perf-profile.children.cycles-pp.__do_sys_mremap
> > >      86.06            +0.9       87.00        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > >      85.56            +1.0       86.54        perf-profile.children.cycles-pp.do_syscall_64
> > >      40.49            +1.4       41.90        perf-profile.children.cycles-pp.do_vmi_munmap
> > >       2.10            +1.5        3.57        perf-profile.children.cycles-pp.do_munmap
> > >       3.62            +2.3        5.90        perf-profile.children.cycles-pp.mas_walk
> > >       5.44            +2.9        8.38        perf-profile.children.cycles-pp.mremap_to
> > >       5.30            +3.1        8.39        perf-profile.children.cycles-pp.mas_find
> > >       0.00            +5.4        5.40        perf-profile.children.cycles-pp.can_modify_mm
> > >      11.46            -0.5       10.96        perf-profile.self.cycles-pp.__slab_free
> > >       4.30            -0.2        4.08        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> > >       2.51            -0.2        2.34        perf-profile.self.cycles-pp.rcu_cblist_dequeue
> > >       2.41 ą  2%      -0.2        2.25        perf-profile.self.cycles-pp.down_write
> > >       2.21            -0.1        2.11        perf-profile.self.cycles-pp.native_flush_tlb_one_user
> > >       2.37            -0.1        2.28        perf-profile.self.cycles-pp.mtree_load
> > >       1.60            -0.1        1.51        perf-profile.self.cycles-pp.__memcg_slab_free_hook
> > >       0.18 ą  3%      -0.1        0.10 ą 15%  perf-profile.self.cycles-pp.vm_stat_account
> > >       1.25            -0.1        1.18        perf-profile.self.cycles-pp.move_vma
> > >       1.76            -0.1        1.69        perf-profile.self.cycles-pp.mod_objcg_state
> > >       1.42            -0.1        1.35 ą  2%  perf-profile.self.cycles-pp.__call_rcu_common
> > >       1.41            -0.1        1.34        perf-profile.self.cycles-pp.mas_wr_walk
> > >       1.52            -0.1        1.46        perf-profile.self.cycles-pp.up_write
> > >       1.02            -0.1        0.95        perf-profile.self.cycles-pp.mas_prev_slot
> > >       0.96            -0.1        0.90 ą  2%  perf-profile.self.cycles-pp.vm_area_free_rcu_cb
> > >       1.50            -0.1        1.45        perf-profile.self.cycles-pp.kmem_cache_free
> > >       0.69 ą  3%      -0.1        0.64 ą  2%  perf-profile.self.cycles-pp.rcu_all_qs
> > >       1.14 ą  2%      -0.1        1.09        perf-profile.self.cycles-pp.shuffle_freelist
> > >       1.10            -0.1        1.05        perf-profile.self.cycles-pp.__cond_resched
> > >       1.40            -0.0        1.35        perf-profile.self.cycles-pp.do_vmi_align_munmap
> > >       0.99            -0.0        0.94        perf-profile.self.cycles-pp.mas_preallocate
> > >       0.88            -0.0        0.83        perf-profile.self.cycles-pp.___slab_alloc
> > >       0.55            -0.0        0.50        perf-profile.self.cycles-pp.mremap_to
> > >       0.98            -0.0        0.93        perf-profile.self.cycles-pp.move_ptes
> > >       0.78            -0.0        0.74        perf-profile.self.cycles-pp.percpu_counter_add_batch
> > >       0.21 ą  2%      -0.0        0.18 ą  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
> > >       0.44 ą  2%      -0.0        0.40 ą  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> > >       0.92            -0.0        0.89        perf-profile.self.cycles-pp.mas_store_gfp
> > >       0.86            -0.0        0.82        perf-profile.self.cycles-pp.mas_pop_node
> > >       0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> > >       1.15            -0.0        1.12        perf-profile.self.cycles-pp.clear_bhb_loop
> > >       1.14            -0.0        1.11        perf-profile.self.cycles-pp.vma_merge
> > >       0.66            -0.0        0.63        perf-profile.self.cycles-pp.__split_vma
> > >       0.16 ą  6%      -0.0        0.13 ą  7%  perf-profile.self.cycles-pp.cap_vm_enough_memory
> > >       0.82            -0.0        0.79        perf-profile.self.cycles-pp.mas_wr_store_entry
> > >       0.54 ą  2%      -0.0        0.52        perf-profile.self.cycles-pp.get_old_pud
> > >       0.43            -0.0        0.40        perf-profile.self.cycles-pp.do_munmap
> > >       0.51 ą  2%      -0.0        0.48 ą  2%  perf-profile.self.cycles-pp.security_mmap_addr
> > >       0.50            -0.0        0.48        perf-profile.self.cycles-pp.refill_obj_stock
> > >       0.24            -0.0        0.22        perf-profile.self.cycles-pp.mas_prev
> > >       0.71            -0.0        0.69        perf-profile.self.cycles-pp.unmap_page_range
> > >       0.48            -0.0        0.45        perf-profile.self.cycles-pp.find_vma_prev
> > >       0.42            -0.0        0.40        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> > >       0.66            -0.0        0.64        perf-profile.self.cycles-pp.mas_store_prealloc
> > >       0.31            -0.0        0.29        perf-profile.self.cycles-pp.mas_prev_setup
> > >       0.43            -0.0        0.41        perf-profile.self.cycles-pp.mas_wr_end_piv
> > >       0.78            -0.0        0.76        perf-profile.self.cycles-pp.userfaultfd_unmap_complete
> > >       0.28            -0.0        0.26 ą  2%  perf-profile.self.cycles-pp.mas_put_in_tree
> > >       0.42            -0.0        0.40        perf-profile.self.cycles-pp.mremap_userfaultfd_prep
> > >       0.28            -0.0        0.26        perf-profile.self.cycles-pp.free_pgtables
> > >       0.39            -0.0        0.37        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > >       0.30 ą  2%      -0.0        0.28        perf-profile.self.cycles-pp.zap_pmd_range
> > >       0.32            -0.0        0.31        perf-profile.self.cycles-pp.unmap_vmas
> > >       0.21            -0.0        0.20        perf-profile.self.cycles-pp.__get_unmapped_area
> > >       0.18 ą  2%      -0.0        0.17 ą  2%  perf-profile.self.cycles-pp.lru_add_drain_cpu
> > >       0.06            -0.0        0.05        perf-profile.self.cycles-pp.ksm_madvise
> > >       0.45            +0.0        0.46        perf-profile.self.cycles-pp.do_vmi_munmap
> > >       0.37            +0.0        0.39        perf-profile.self.cycles-pp._raw_spin_lock
> > >       1.06            +0.1        1.18        perf-profile.self.cycles-pp.mas_next_slot
> > >       1.50            +0.5        1.97        perf-profile.self.cycles-pp.mas_find
> > >       0.00            +1.4        1.35        perf-profile.self.cycles-pp.can_modify_mm
> > >       3.13            +2.0        5.13        perf-profile.self.cycles-pp.mas_walk
> > >
> > >
> > > ***************************************************************************************************
> > > lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pkey/stress-ng/60s
> > >
> > > commit:
> > >   ff388fe5c4 ("mseal: wire up mseal syscall")
> > >   8be7258aad ("mseal: add mseal syscall")
> > >
> > > ff388fe5c481d39c 8be7258aad44b5e25977a98db13
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >      10539            -2.5%      10273        vmstat.system.cs
> > >       0.28 ą  5%     -20.1%       0.22 ą  7%  sched_debug.cfs_rq:/.h_nr_running.stddev
> > >       1419 ą  7%     -15.3%       1202 ą  6%  sched_debug.cfs_rq:/.util_avg.max
> > >       0.28 ą  6%     -18.4%       0.23 ą  8%  sched_debug.cpu.nr_running.stddev
> > >  8.736e+08            -3.6%  8.423e+08        stress-ng.pkey.ops
> > >   14560560            -3.6%   14038795        stress-ng.pkey.ops_per_sec
> > >     770.39 ą  4%      -5.0%     732.04        stress-ng.time.user_time
> > >     244657 ą  3%      +5.8%     258782 ą  3%  proc-vmstat.nr_slab_unreclaimable
> > >   73133541            -2.1%   71588873        proc-vmstat.numa_hit
> > >   72873579            -2.1%   71357274        proc-vmstat.numa_local
> > >  1.842e+08            -2.5%  1.796e+08        proc-vmstat.pgalloc_normal
> > >  1.767e+08            -2.8%  1.717e+08        proc-vmstat.pgfree
> > >    1345346 ą 40%     -73.1%     362064 ą124%  numa-vmstat.node0.nr_inactive_anon
> > >    1345340 ą 40%     -73.1%     362062 ą124%  numa-vmstat.node0.nr_zone_inactive_anon
> > >    2420830 ą 14%     +35.1%    3270248 ą 16%  numa-vmstat.node1.nr_file_pages
> > >    2067871 ą 13%     +51.5%    3132982 ą 17%  numa-vmstat.node1.nr_inactive_anon
> > >     191406 ą 17%     +33.6%     255808 ą 14%  numa-vmstat.node1.nr_mapped
> > >       2452 ą 61%    +104.4%       5012 ą 35%  numa-vmstat.node1.nr_page_table_pages
> > >    2067853 ą 13%     +51.5%    3132966 ą 17%  numa-vmstat.node1.nr_zone_inactive_anon
> > >    5379238 ą 40%     -73.0%    1453605 ą123%  numa-meminfo.node0.Inactive
> > >    5379166 ą 40%     -73.0%    1453462 ą123%  numa-meminfo.node0.Inactive(anon)
> > >    8741077 ą 22%     -36.7%    5531290 ą 28%  numa-meminfo.node0.MemUsed
> > >    9651902 ą 13%     +35.8%   13105318 ą 16%  numa-meminfo.node1.FilePages
> > >    8239855 ą 13%     +52.4%   12556929 ą 17%  numa-meminfo.node1.Inactive
> > >    8239712 ą 13%     +52.4%   12556853 ą 17%  numa-meminfo.node1.Inactive(anon)
> > >     761944 ą 18%     +34.6%    1025906 ą 14%  numa-meminfo.node1.Mapped
> > >   11679628 ą 11%     +31.2%   15322841 ą 14%  numa-meminfo.node1.MemUsed
> > >       9874 ą 62%    +104.6%      20200 ą 36%  numa-meminfo.node1.PageTables
> > >       0.74            -4.2%       0.71        perf-stat.i.MPKI
> > >  1.245e+11            +2.3%  1.274e+11        perf-stat.i.branch-instructions
> > >       0.37            -0.0        0.35        perf-stat.i.branch-miss-rate%
> > >  4.359e+08            -2.1%  4.265e+08        perf-stat.i.branch-misses
> > >  4.672e+08            -2.6%  4.548e+08        perf-stat.i.cache-misses
> > >  7.276e+08            -2.7%  7.082e+08        perf-stat.i.cache-references
> > >       1.00            -1.6%       0.98        perf-stat.i.cpi
> > >       1364            +2.9%       1404        perf-stat.i.cycles-between-cache-misses
> > >  6.392e+11            +1.7%  6.499e+11        perf-stat.i.instructions
> > >       1.00            +1.6%       1.02        perf-stat.i.ipc
> > >       0.74            -4.3%       0.71        perf-stat.overall.MPKI
> > >       0.35            -0.0        0.33        perf-stat.overall.branch-miss-rate%
> > >       1.00            -1.6%       0.99        perf-stat.overall.cpi
> > >       1356            +2.9%       1395        perf-stat.overall.cycles-between-cache-misses
> > >       1.00            +1.6%       1.01        perf-stat.overall.ipc
> > >  1.209e+11            +1.9%  1.232e+11        perf-stat.ps.branch-instructions
> > >  4.188e+08            -2.6%  4.077e+08        perf-stat.ps.branch-misses
> > >  4.585e+08            -3.1%  4.441e+08        perf-stat.ps.cache-misses
> > >  7.124e+08            -3.1%  6.901e+08        perf-stat.ps.cache-references
> > >      10321            -2.6%      10053        perf-stat.ps.context-switches
> > >
> > >
> > >
> > >
> > >
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > >
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  6:04           ` Oliver Sang
  2024-08-06 14:38             ` Linus Torvalds
@ 2024-08-06 21:37             ` Pedro Falcato
  2024-08-07  5:54               ` Oliver Sang
  1 sibling, 1 reply; 29+ messages in thread
From: Pedro Falcato @ 2024-08-06 21:37 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Linus Torvalds, Jeff Xu, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

On Tue, Aug 6, 2024 at 7:05 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Linus,
>
> On Mon, Aug 05, 2024 at 12:33:58PM -0700, Linus Torvalds wrote:
> > On Mon, 5 Aug 2024 at 11:55, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > So please consider this a "maybe something like this" patch, but that
> > > 'arch_unmap()' really is pretty nasty
> >
> > Actually, the whole powerpc vdso code confused me. It's not the vvar
> > thing that wants this close thing, it's the other ones that have the
> > remap thing.
> >
> > .. and there were two of those error cases that needed to reset the
> > vdso pointer.
> >
> > That all shows just how carefully I was reading this code.
> >
> > New version - still untested, but now I've read through it one more
> > time - attached.
>
> we tested this version by applying it directly upon 8be7258aad,  but seems it
> have little impact to performance. still similar regression if comparing to
> ff388fe5c4.

Hi,

I've just sent out a patch set[1] that should alleviate (or hopefully
totally fix) these performance regressions. It'd be great if you could
test it.

For everyone: Apologies if you're in the CC list and I didn't CC you,
but I tried to keep my patch set's CC list relatively short and clean
(and I focused on the active participants).
Everyone's comments are very welcome.

[1]: https://lore.kernel.org/all/20240806212808.1885309-1-pedro.falcato@gmail.com/
-- 
Pedro

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06 21:37             ` Pedro Falcato
@ 2024-08-07  5:54               ` Oliver Sang
  0 siblings, 0 replies; 29+ messages in thread
From: Oliver Sang @ 2024-08-07  5:54 UTC (permalink / raw)
  To: Pedro Falcato
  Cc: Linus Torvalds, Jeff Xu, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin, oliver.sang

hi, Pedro,

On Tue, Aug 06, 2024 at 10:37:08PM +0100, Pedro Falcato wrote:
> On Tue, Aug 6, 2024 at 7:05 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Linus,
> >
> > On Mon, Aug 05, 2024 at 12:33:58PM -0700, Linus Torvalds wrote:
> > > On Mon, 5 Aug 2024 at 11:55, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > So please consider this a "maybe something like this" patch, but that
> > > > 'arch_unmap()' really is pretty nasty
> > >
> > > Actually, the whole powerpc vdso code confused me. It's not the vvar
> > > thing that wants this close thing, it's the other ones that have the
> > > remap thing.
> > >
> > > .. and there were two of those error cases that needed to reset the
> > > vdso pointer.
> > >
> > > That all shows just how carefully I was reading this code.
> > >
> > > New version - still untested, but now I've read through it one more
> > > time - attached.
> >
> > we tested this version by applying it directly upon 8be7258aad,  but seems it
> > have little impact to performance. still similar regression if comparing to
> > ff388fe5c4.
> 
> Hi,
> 
> I've just sent out a patch set[1] that should alleviate (or hopefully
> totally fix) these performance regressions. It'd be great if you could
> test it.

yes, your patch set totally fixes the regression.

our bot automatically fetch the patch set and apply it upon mainline
d4560686726f7 as below.

d58de4f958df2 (linux-review/Pedro-Falcato/mm-Move-can_modify_vma-to-mm-internal-h/20240807-054658) mm: Remove can_modify_mm()
32668c3efc23f mseal: Replace can_modify_mm_madv with a vma variant
5c3f48cf634c9 mseal: Fix is_madv_discard()
8cde2d71bd0f8 mm/mremap: Replace can_modify_mm with can_modify_vma
cc3471461a854 mm/mprotect: Replace can_modify_mm with can_modify_vma
abff8a9b6023e mm/munmap: Replace can_modify_mm with can_modify_vma
c1bf07aa19804 mm: Move can_modify_vma to mm/internal.h
d4560686726f7 (HEAD, linus/master) Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

I tested patch set tip d58de4f958df2 as well as d4560686726f7, below is the
results combining with 8be7258aad and its parent.

data from 8be7258aad and d4560686726f7 are close enough to within the noise.
the patch set tip recover the performance to the level of ff388fe5c4.


=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pagemove/stress-ng/60s

commit:
  ff388fe5c4 ("mseal: wire up mseal syscall")
  8be7258aad ("mseal: add mseal syscall")
  d456068672 ("Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost")
  d58de4f958 ("mm: Remove can_modify_mm()")

ff388fe5c481d39c 8be7258aad44b5e25977a98db13 d4560686726f7a357922f300fc8 d58de4f958df225c04fd490fe2d
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
     44.92            -0.4%      44.76            -5.1%      42.62            -5.7%      42.37        boot-time.boot
     33.12            -0.4%      33.00            -7.0%      30.81            -7.0%      30.81        boot-time.dhcp
      2631            -0.4%       2620            -5.6%       2483            -6.2%       2468        boot-time.idle
      4958            +1.3%       5024            +1.2%       5017            +0.0%       4960        time.percent_of_cpu_this_job_got
      2916            +1.5%       2960            +1.4%       2956            +0.1%       2919        time.system_time
     65.85            -7.0%      61.27            -6.8%      61.40            -3.4%      63.64        time.user_time
     17869 ±  8%      -5.6%      16869 ± 28%     -24.5%      13488 ± 25%      -3.5%      17240 ±  9%  numa-vmstat.node0.nr_slab_reclaimable
      5182 ± 29%     +19.8%       6207 ± 75%     +80.1%       9334 ± 36%      +7.9%       5591 ± 28%  numa-vmstat.node1.nr_slab_reclaimable
     10153 ±170%   +1041.4%     115893 ±214%   +2787.4%     293183 ± 97%    +371.7%      47894 ± 90%  numa-vmstat.node1.nr_unevictable
     10153 ±170%   +1041.4%     115893 ±214%   +2787.4%     293183 ± 97%    +371.7%      47894 ± 90%  numa-vmstat.node1.nr_zone_unevictable
     71475 ±  8%      -5.6%      67478 ± 28%     -24.5%      53952 ± 25%      -3.5%      68960 ±  9%  numa-meminfo.node0.KReclaimable
     71475 ±  8%      -5.6%      67478 ± 28%     -24.5%      53952 ± 25%      -3.5%      68960 ±  9%  numa-meminfo.node0.SReclaimable
     20732 ± 29%     +19.8%      24839 ± 75%     +80.1%      37346 ± 36%      +7.9%      22364 ± 28%  numa-meminfo.node1.KReclaimable
     20732 ± 29%     +19.8%      24839 ± 75%     +80.1%      37346 ± 36%      +7.9%      22364 ± 28%  numa-meminfo.node1.SReclaimable
     40615 ±170%   +1041.4%     463573 ±214%   +2787.4%    1172733 ± 97%    +371.7%     191576 ± 90%  numa-meminfo.node1.Unevictable
     23051            +0.1%      23079            -1.0%      22823            -1.0%      22831        proc-vmstat.nr_slab_reclaimable
  41535129            -4.5%   39669773            -4.9%   39501465            -0.3%   41415171        proc-vmstat.numa_hit
  41465484            -4.5%   39602956            -4.9%   39434855            -0.3%   41347677        proc-vmstat.numa_local
  77303973            -4.6%   73780662            -5.0%   73449965            -0.3%   77049179        proc-vmstat.pgalloc_normal
  77022096            -4.6%   73502058            -5.0%   73168463            -0.3%   76769054        proc-vmstat.pgfree
  18381956            -4.9%   17473438            -5.1%   17450543            -0.4%   18316849        stress-ng.pagemove.ops
    306349            -4.9%     291188            -5.1%     290820            -0.4%     305268        stress-ng.pagemove.ops_per_sec
    209930            -6.2%     196996 ±  2%      -5.4%     198614            -0.5%     208922        stress-ng.pagemove.page_remaps_per_sec
      4958            +1.3%       5024            +1.2%       5017            +0.0%       4960        stress-ng.time.percent_of_cpu_this_job_got
      2916            +1.5%       2960            +1.4%       2956            +0.1%       2919        stress-ng.time.system_time
 3.337e+10 ±  4%      +2.3%  3.414e+10 ±  3%      +5.0%  3.503e+10            +1.2%  3.376e+10        perf-stat.i.branch-instructions
      1.13            -2.1%       1.10            -2.3%       1.10            +0.1%       1.13        perf-stat.i.cpi
 1.695e+11 ±  4%      +1.1%  1.715e+11 ±  3%      +3.8%  1.761e+11            +1.2%  1.715e+11        perf-stat.i.instructions
      0.89            +2.2%       0.91            +2.1%       0.91            -0.4%       0.89        perf-stat.i.ipc
      1.04            -7.2%       0.97            -7.2%       0.97            -0.2%       1.04        perf-stat.overall.MPKI
      1.13            -2.3%       1.10            -2.1%       1.10            +0.3%       1.13        perf-stat.overall.cpi
      1082            +5.4%       1140            +5.5%       1141            +0.5%       1087        perf-stat.overall.cycles-between-cache-misses
      0.89            +2.3%       0.91            +2.1%       0.91            -0.3%       0.88        perf-stat.overall.ipc
 3.284e+10 ±  4%      +2.4%  3.362e+10 ±  2%      +4.8%  3.443e+10            +1.1%   3.32e+10        perf-stat.ps.branch-instructions
    192.79            -3.9%     185.32 ±  2%      -1.7%     189.49            +0.2%     193.10        perf-stat.ps.cpu-migrations
 1.669e+11 ±  4%      +1.2%  1.689e+11 ±  2%      +3.7%  1.731e+11            +1.1%  1.687e+11        perf-stat.ps.instructions
 1.048e+13            +2.8%  1.078e+13            +2.1%   1.07e+13            -0.6%  1.042e+13        perf-stat.total.instructions
     74.97            -1.9       73.07            -1.7       73.32            +0.4       75.38        perf-profile.calltrace.cycles-pp.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
     36.79            -1.6       35.22            -1.4       35.36            +0.3       37.08        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
     24.98            -1.3       23.64            -1.3       23.73            +0.0       24.99        perf-profile.calltrace.cycles-pp.copy_vma.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     19.91            -1.1       18.85            -1.2       18.69            -0.2       19.72        perf-profile.calltrace.cycles-pp.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.9        9.78 ±  2%      -0.4       10.33 ±  3%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.9        9.78 ±  2%      -0.4       10.33 ±  3%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.9        9.78 ±  2%      -0.4       10.33 ±  3%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
     10.64 ±  3%      -0.9        9.79 ±  3%      -0.9        9.73 ±  2%      -0.4       10.29 ±  3%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     10.63 ±  3%      -0.9        9.78 ±  3%      -0.9        9.72 ±  2%      -0.4       10.28 ±  3%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
     10.63 ±  3%      -0.9        9.78 ±  3%      -0.9        9.72 ±  2%      -0.4       10.28 ±  3%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     10.63 ±  3%      -0.9        9.78 ±  3%      -0.9        9.72 ±  2%      -0.4       10.28 ±  3%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
     10.59 ±  3%      -0.8        9.74 ±  3%      -0.9        9.68 ±  2%      -0.4       10.24 ±  3%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
     14.77            -0.8       14.00            -0.7       14.11            +0.0       14.80        perf-profile.calltrace.cycles-pp.vma_merge.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      1.48            -0.5        0.99            -0.5        0.99            +0.0        1.52        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
      5.95 ±  3%      -0.5        5.47 ±  3%      -0.5        5.44 ±  2%      -0.2        5.73 ±  3%  perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      7.88            -0.4        7.48            -0.3        7.57            +0.1        7.97        perf-profile.calltrace.cycles-pp.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.62 ±  3%      -0.4        4.25 ±  3%      -0.4        4.20 ±  2%      -0.2        4.42 ±  3%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.handle_softirqs
      6.72            -0.4        6.36            -0.4        6.33            -0.1        6.66        perf-profile.calltrace.cycles-pp.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      6.15            -0.3        5.82            -0.3        5.86            +0.0        6.16        perf-profile.calltrace.cycles-pp.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      6.11            -0.3        5.78            -0.3        5.77            -0.0        6.07        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      5.78            -0.3        5.49            -0.2        5.57            +0.1        5.85        perf-profile.calltrace.cycles-pp.move_ptes.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
      5.54            -0.3        5.25            -0.3        5.28            +0.0        5.56        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      5.56            -0.3        5.28            -0.3        5.28            -0.0        5.54        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_merge.copy_vma.move_vma.__do_sys_mremap
      5.19            -0.3        4.92            -0.2        4.95            +0.0        5.21        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma.do_vmi_align_munmap
      5.20            -0.3        4.94            -0.3        4.95            -0.0        5.18        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma.move_vma
      3.20 ±  4%      -0.3        2.94 ±  3%      -0.3        2.93 ±  2%      -0.1        3.11 ±  3%  perf-profile.calltrace.cycles-pp.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      4.09            -0.2        3.85            -0.3        3.82            -0.1        4.03        perf-profile.calltrace.cycles-pp.vm_area_dup.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      4.68            -0.2        4.45            -0.2        4.46            -0.0        4.67        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge.copy_vma
      2.63 ±  3%      -0.2        2.42 ±  3%      -0.2        2.43 ±  2%      -0.1        2.57 ±  3%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core.handle_softirqs
      2.36 ±  2%      -0.2        2.16 ±  4%      -0.3        2.04 ± 14%      -0.1        2.28 ±  3%  perf-profile.calltrace.cycles-pp.__memcpy.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_complete
      3.56            -0.2        3.36            -0.2        3.34            -0.0        3.52        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma.__do_sys_mremap
      4.00            -0.2        3.81            -0.1        3.87 ±  2%      +0.1        4.06        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_complete.__split_vma
      1.35            -0.2        1.16            -0.2        1.16            +0.0        1.36        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
      3.40            -0.2        3.22            -0.2        3.24            +0.0        3.41        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma.__do_sys_mremap
      2.22            -0.2        2.06            -0.2        2.07            +0.0        2.24        perf-profile.calltrace.cycles-pp.find_vma_prev.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      0.96            -0.2        0.82            -0.2        0.81            +0.0        0.97        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.do_munmap.mremap_to
      3.25            -0.1        3.10            -0.1        3.14            +0.0        3.30        perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      1.81 ±  4%      -0.1        1.67 ±  3%      -0.2        1.64 ±  2%      -0.1        1.74 ±  3%  perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
      1.97 ±  3%      -0.1        1.83 ±  3%      -0.6        1.41 ±  3%      -0.5        1.50 ±  2%  perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
      2.26            -0.1        2.12            -0.2        2.05            -0.1        2.16        perf-profile.calltrace.cycles-pp.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      3.10            -0.1        2.96            +0.3        3.38            +0.5        3.60        perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
      3.13            -0.1        2.99            -0.1        3.06            +0.1        3.23        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
      2.97            -0.1        2.85            -0.2        2.75 ±  2%      -0.0        2.94 ±  2%  perf-profile.calltrace.cycles-pp.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap.move_vma
      2.05            -0.1        1.93            -0.1        1.98            -0.1        1.99        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_merge.copy_vma.move_vma.__do_sys_mremap
      8.26            -0.1        8.14            +0.2        8.45            +0.5        8.78        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      2.45            -0.1        2.34            -0.1        2.34            +0.0        2.46        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma
      2.43            -0.1        2.32            -0.0        2.39            +0.1        2.55        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.75 ±  2%      -0.1        1.64 ±  3%      -0.1        1.64 ±  4%      +0.0        1.77 ±  4%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_merge.copy_vma.move_vma.__do_sys_mremap
      0.54            -0.1        0.44 ± 37%      -0.0        0.51            +0.0        0.55        perf-profile.calltrace.cycles-pp.mas_find.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
      1.27 ±  2%      -0.1        1.16 ±  4%      -0.1        1.14 ±  6%      -0.0        1.23 ±  4%  perf-profile.calltrace.cycles-pp.__memcpy.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge
      1.32 ±  3%      -0.1        1.22 ±  3%      -0.1        1.20 ±  2%      -0.0        1.28 ±  3%  perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      2.21            -0.1        2.11            -0.1        2.11            +0.0        2.23        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.move_ptes.move_page_tables
      1.85            -0.1        1.76            -0.1        1.78            +0.0        1.87        perf-profile.calltrace.cycles-pp.vma_link.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      2.14 ±  2%      -0.1        2.05 ±  2%      -0.1        2.00 ±  2%      +0.0        2.14 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      1.79 ±  2%      -0.1        1.70            +0.1        1.93            +0.3        2.06        perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma
      1.40            -0.1        1.31            -0.1        1.27            -0.1        1.34        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap.do_vmi_munmap
      1.39            -0.1        1.30            -0.1        1.34            -0.1        1.33        perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma.move_vma
      1.24            -0.1        1.16            -0.1        1.13            -0.1        1.19        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma.do_vmi_align_munmap
      0.94            -0.1        0.86            -0.1        0.86            +0.0        0.96        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.move_vma.__do_sys_mremap
      1.23            -0.1        1.15            -0.0        1.18            -0.1        1.18        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge.copy_vma
      1.54            -0.1        1.46            -0.0        1.50            +0.1        1.60        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
      0.73            -0.1        0.67            -0.1        0.67            +0.0        0.74        perf-profile.calltrace.cycles-pp.mas_walk.find_vma_prev.copy_vma.move_vma.__do_sys_mremap
      1.15            -0.1        1.09            -0.1        1.08            -0.0        1.13        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.do_vmi_align_munmap
      0.60 ±  2%      -0.1        0.54            -0.0        0.56            -0.0        0.59        perf-profile.calltrace.cycles-pp.security_mmap_addr.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64
      1.27            -0.1        1.21            -0.0        1.22            +0.0        1.30        perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
     38.74            -0.1       38.68            +0.1       38.80            +0.3       39.06        perf-profile.calltrace.cycles-pp.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.38 ±  4%      -0.1        1.32 ±  2%      -0.2        1.20 ±  3%      -0.1        1.27 ±  2%  perf-profile.calltrace.cycles-pp.obj_cgroup_charge.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
      0.72            -0.1        0.66            -0.1        0.66            +0.0        0.72        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge.copy_vma.move_vma.__do_sys_mremap
      0.70 ±  2%      -0.1        0.64 ±  3%      +0.1        0.80 ±  3%      +0.2        0.85 ±  3%  perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.copy_vma
      0.79            -0.1        0.73            -0.1        0.73            +0.0        0.79        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
      0.80 ±  2%      -0.1        0.75            -0.1        0.72 ±  3%      -0.0        0.77 ±  2%  perf-profile.calltrace.cycles-pp.__call_rcu_common.mas_wr_node_store.mas_wr_store_entry.mas_store_prealloc.vma_merge
      0.78            -0.1        0.72            -0.0        0.73            +0.0        0.78        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
      1.02            -0.1        0.96            +0.0        1.02            +0.1        1.09        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
      1.63            -0.1        1.58            -0.1        1.58            +0.0        1.64        perf-profile.calltrace.cycles-pp.__get_unmapped_area.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.62            -0.0        0.58            -0.1        0.57            +0.0        0.63        perf-profile.calltrace.cycles-pp.mas_prev_setup.mas_prev.vma_merge.copy_vma.move_vma
      0.60 ±  3%      -0.0        0.56 ±  3%      -0.0        0.59 ±  3%      +0.0        0.63 ±  3%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.vm_area_free_rcu_cb.rcu_do_batch.rcu_core
      0.67            -0.0        0.62            -0.1        0.59            -0.1        0.61 ±  2%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.86            -0.0        0.81            -0.0        0.82            +0.0        0.87        perf-profile.calltrace.cycles-pp.mtree_load.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64
      1.02            -0.0        0.97            -0.0        0.98            +0.0        1.04        perf-profile.calltrace.cycles-pp.vma_to_resize.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.76 ±  2%      -0.0        0.71            -0.1        0.71 ±  2%      -0.0        0.74 ±  2%  perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.__split_vma
      0.81            -0.0        0.77            -0.1        0.76            -0.0        0.81        perf-profile.calltrace.cycles-pp.mtree_load.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      0.70            -0.0        0.66            -0.0        0.66            -0.0        0.69        perf-profile.calltrace.cycles-pp.__call_rcu_common.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.67 ±  2%      -0.0        0.63            -0.0        0.65 ±  2%      +0.0        0.68        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.mremap
      0.56            -0.0        0.51            -0.2        0.38 ± 57%      +0.0        0.56        perf-profile.calltrace.cycles-pp.mas_walk.mas_prev_setup.mas_prev.vma_merge.copy_vma
      0.69            -0.0        0.65            -0.0        0.64 ±  2%      -0.0        0.68        perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_noprof.vm_area_dup.copy_vma.move_vma
      0.98            -0.0        0.93            -0.0        0.94            +0.0        0.98        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.77 ±  5%      -0.0        0.73 ±  2%      -0.1        0.66 ±  4%      -0.1        0.70 ±  4%  perf-profile.calltrace.cycles-pp.obj_cgroup_charge.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.vm_area_dup.copy_vma
      0.78            -0.0        0.74            -0.0        0.75            +0.0        0.79        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_link.copy_vma.move_vma.__do_sys_mremap
      1.12            -0.0        1.08            -0.1        1.06            +0.0        1.12        perf-profile.calltrace.cycles-pp.clear_bhb_loop.mremap
      0.68            -0.0        0.65            -0.0        0.66            +0.0        0.68        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_link.copy_vma.move_vma.__do_sys_mremap
      1.00            -0.0        0.97            -0.0        0.96            +0.0        1.02        perf-profile.calltrace.cycles-pp.mt_find.vma_merge.copy_vma.move_vma.__do_sys_mremap
      0.62            -0.0        0.59            -0.0        0.59            -0.0        0.62        perf-profile.calltrace.cycles-pp.mas_prev_slot.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
      0.88            -0.0        0.85            -0.0        0.85            +0.0        0.88        perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      1.15            -0.0        1.12            -0.1        1.08            -0.0        1.13        perf-profile.calltrace.cycles-pp.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
      0.60            -0.0        0.57 ±  2%      +0.0        0.62            +0.1        0.66        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.59            -0.0        0.56            -0.0        0.56            -0.0        0.57        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.mremap
      0.62 ±  2%      -0.0        0.59 ±  2%      -0.0        0.59            +0.0        0.63        perf-profile.calltrace.cycles-pp.get_old_pud.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
      0.65            -0.0        0.63            -0.0        0.63            +0.0        0.66        perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
      0.55            -0.0        0.53            +0.0        0.58            +0.1        0.61        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.move_ptes.move_page_tables.move_vma.__do_sys_mremap
      0.74            -0.0        0.72            -0.1        0.68 ±  2%      -0.0        0.71 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.anon_vma_clone.copy_vma.move_vma.__do_sys_mremap
      0.67            +0.1        0.74            +0.1        0.73            +0.0        0.68        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
      0.76            +0.1        0.84            +0.1        0.82            +0.0        0.78        perf-profile.calltrace.cycles-pp.__madvise
      0.66            +0.1        0.74            +0.1        0.73            +0.0        0.67        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.63            +0.1        0.71            +0.1        0.70            +0.0        0.64        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.62            +0.1        0.70            +0.1        0.69            +0.0        0.64        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      3.47            +0.1        3.55            +0.4        3.89            +0.5        3.95        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
     87.67            +0.8       88.47            +0.9       88.53            +0.3       88.01        perf-profile.calltrace.cycles-pp.mremap
      0.00            +0.9        0.86            +0.8        0.84            +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.do_munmap
      0.00            +0.9        0.88            +0.9        0.86            +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.mremap_to.__do_sys_mremap
      0.00            +0.9        0.90 ±  2%      +0.9        0.90            +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.can_modify_mm.do_vmi_munmap.move_vma
     84.82            +1.0       85.80            +1.0       85.84            +0.4       85.19        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mremap
     84.66            +1.0       85.65            +1.0       85.69            +0.4       85.04        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
     83.71            +1.0       84.73            +1.2       84.89            +0.5       84.18        perf-profile.calltrace.cycles-pp.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
      0.00            +1.1        1.10            +1.1        1.08            +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64
      0.00            +1.2        1.21            +1.2        1.20            +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to
      2.09            +1.5        3.60            +1.5        3.59            +0.0        2.11        perf-profile.calltrace.cycles-pp.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.5        1.51            +1.5        1.50            +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_find.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap
      1.59            +1.5        3.12            +1.5        3.11            +0.0        1.60        perf-profile.calltrace.cycles-pp.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap.do_syscall_64
      0.00            +1.6        1.62            +1.6        1.59            +0.0        0.00        perf-profile.calltrace.cycles-pp.can_modify_mm.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.7        1.72            +1.7        1.72            +0.0        0.00        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.do_munmap.mremap_to.__do_sys_mremap
      0.00            +2.0        2.01            +2.0        1.99            +0.0        0.00        perf-profile.calltrace.cycles-pp.can_modify_mm.do_vmi_munmap.move_vma.__do_sys_mremap.do_syscall_64
      5.34            +3.0        8.38            +3.0        8.34            +0.1        5.41        perf-profile.calltrace.cycles-pp.mremap_to.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe.mremap
     75.13            -1.9       73.22            -1.7       73.47            +0.4       75.55        perf-profile.children.cycles-pp.move_vma
     37.01            -1.6       35.43            -1.4       35.56            +0.3       37.30        perf-profile.children.cycles-pp.do_vmi_align_munmap
     25.06            -1.3       23.71            -1.3       23.80            +0.0       25.06        perf-profile.children.cycles-pp.copy_vma
     20.00            -1.1       18.94            -1.2       18.77            -0.2       19.81        perf-profile.children.cycles-pp.__split_vma
     19.86            -1.0       18.87            -0.9       18.92            -0.0       19.84        perf-profile.children.cycles-pp.rcu_core
     19.84            -1.0       18.85            -0.9       18.90            -0.0       19.82        perf-profile.children.cycles-pp.rcu_do_batch
     19.88            -1.0       18.89            -0.9       18.94            -0.0       19.86        perf-profile.children.cycles-pp.handle_softirqs
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.9        9.78 ±  2%      -0.4       10.33 ±  3%  perf-profile.children.cycles-pp.kthread
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.9        9.78 ±  2%      -0.4       10.34 ±  3%  perf-profile.children.cycles-pp.ret_from_fork
     10.70 ±  3%      -0.9        9.84 ±  3%      -0.9        9.78 ±  2%      -0.4       10.34 ±  3%  perf-profile.children.cycles-pp.ret_from_fork_asm
     10.64 ±  3%      -0.9        9.79 ±  3%      -0.9        9.73 ±  2%      -0.4       10.29 ±  3%  perf-profile.children.cycles-pp.smpboot_thread_fn
     10.63 ±  3%      -0.9        9.78 ±  3%      -0.9        9.72 ±  2%      -0.4       10.28 ±  3%  perf-profile.children.cycles-pp.run_ksoftirqd
     17.53            -0.8       16.70            -0.8       16.76            +0.0       17.54        perf-profile.children.cycles-pp.kmem_cache_free
     15.28            -0.8       14.47            -1.0       14.33            -0.2       15.04        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
     15.16            -0.8       14.37            -0.7       14.48            +0.0       15.20        perf-profile.children.cycles-pp.vma_merge
     12.18            -0.6       11.54            -0.6       11.60            +0.0       12.20        perf-profile.children.cycles-pp.mas_wr_store_entry
     11.98            -0.6       11.36            -0.6       11.41            +0.0       11.98        perf-profile.children.cycles-pp.mas_store_prealloc
     12.11            -0.6       11.51            -0.6       11.50            -0.1       12.02        perf-profile.children.cycles-pp.__slab_free
     10.86            -0.6       10.26            -0.7       10.21            -0.1       10.75        perf-profile.children.cycles-pp.vm_area_dup
      9.89            -0.5        9.40            -0.5        9.44            +0.0        9.93        perf-profile.children.cycles-pp.mas_wr_node_store
      8.36            -0.4        7.92            -0.4        7.97            +0.1        8.49        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      7.98            -0.4        7.58            -0.3        7.68            +0.1        8.08        perf-profile.children.cycles-pp.move_page_tables
      6.69            -0.4        6.33            -0.3        6.39            +0.0        6.72        perf-profile.children.cycles-pp.vma_complete
      5.86            -0.3        5.56            -0.2        5.64            +0.1        5.93        perf-profile.children.cycles-pp.move_ptes
      5.11            -0.3        4.81            -0.3        4.80            -0.2        4.95        perf-profile.children.cycles-pp.mas_preallocate
      6.05            -0.3        5.75            -0.3        5.77            +0.0        6.07        perf-profile.children.cycles-pp.vm_area_free_rcu_cb
      2.98 ±  2%      -0.3        2.73 ±  4%      -0.3        2.66 ±  6%      -0.1        2.88 ±  3%  perf-profile.children.cycles-pp.__memcpy
      3.48            -0.2        3.26            -0.2        3.25            -0.0        3.45        perf-profile.children.cycles-pp.___slab_alloc
      3.46 ±  2%      -0.2        3.26            +0.3        3.71 ±  2%      +0.5        3.92 ±  2%  perf-profile.children.cycles-pp.mod_objcg_state
      2.91            -0.2        2.73            -0.2        2.73            -0.1        2.79        perf-profile.children.cycles-pp.mas_alloc_nodes
      2.43            -0.2        2.25            -0.2        2.27            +0.0        2.45        perf-profile.children.cycles-pp.find_vma_prev
      3.47            -0.2        3.29            -0.2        3.27 ±  2%      +0.0        3.50 ±  2%  perf-profile.children.cycles-pp.down_write
      3.46            -0.2        3.28            -0.2        3.30            +0.0        3.46        perf-profile.children.cycles-pp.flush_tlb_mm_range
      4.22            -0.2        4.06            -0.3        3.91            -0.1        4.16        perf-profile.children.cycles-pp.anon_vma_clone
      3.32            -0.2        3.17            -0.1        3.25            +0.1        3.42        perf-profile.children.cycles-pp.__memcg_slab_free_hook
      3.35            -0.2        3.20            -0.1        3.24            +0.0        3.40        perf-profile.children.cycles-pp.mas_store_gfp
      2.22            -0.1        2.07            -0.1        2.12            +0.0        2.24        perf-profile.children.cycles-pp.__cond_resched
      2.05 ±  2%      -0.1        1.91            -0.1        1.92            -0.0        2.04        perf-profile.children.cycles-pp.allocate_slab
      3.18            -0.1        3.04            -0.1        3.11            +0.1        3.28        perf-profile.children.cycles-pp.unmap_vmas
      2.24            -0.1        2.11 ±  2%      -0.1        2.10 ±  3%      +0.0        2.25 ±  3%  perf-profile.children.cycles-pp.vma_prepare
      2.12            -0.1        2.00            -0.2        1.95            -0.0        2.08        perf-profile.children.cycles-pp.__call_rcu_common
      2.66            -0.1        2.53            -0.1        2.53            +0.0        2.68        perf-profile.children.cycles-pp.mtree_load
      2.46            -0.1        2.34            -0.1        2.34            +0.0        2.47        perf-profile.children.cycles-pp.rcu_cblist_dequeue
      2.45 ±  4%      -0.1        2.33 ±  2%      -0.3        2.15 ±  3%      -0.2        2.28 ±  2%  perf-profile.children.cycles-pp.obj_cgroup_charge
      2.49            -0.1        2.38            -0.1        2.39            +0.0        2.51        perf-profile.children.cycles-pp.flush_tlb_func
      8.32            -0.1        8.21            +0.2        8.52            +0.5        8.85        perf-profile.children.cycles-pp.unmap_region
      2.48            -0.1        2.37            -0.0        2.44            +0.1        2.59        perf-profile.children.cycles-pp.unmap_page_range
      2.23            -0.1        2.13            -0.1        2.12            +0.0        2.24        perf-profile.children.cycles-pp.native_flush_tlb_one_user
      1.77            -0.1        1.67            -0.1        1.68            -0.0        1.76        perf-profile.children.cycles-pp.mas_wr_walk
      1.88            -0.1        1.78            -0.1        1.80            +0.0        1.89        perf-profile.children.cycles-pp.vma_link
      1.40            -0.1        1.31            -0.1        1.32            -0.0        1.40 ±  2%  perf-profile.children.cycles-pp.shuffle_freelist
      1.84            -0.1        1.75            -0.1        1.75            +0.0        1.85        perf-profile.children.cycles-pp.up_write
      0.97 ±  2%      -0.1        0.88            -0.1        0.90 ±  2%      -0.0        0.94 ±  2%  perf-profile.children.cycles-pp.rcu_all_qs
      1.03            -0.1        0.95            -0.1        0.94            +0.0        1.04        perf-profile.children.cycles-pp.mas_prev
      0.92            -0.1        0.85            -0.1        0.84            -0.0        0.92        perf-profile.children.cycles-pp.mas_prev_setup
      1.58            -0.1        1.50            -0.0        1.54            +0.1        1.64        perf-profile.children.cycles-pp.zap_pmd_range
      1.24            -0.1        1.17            -0.1        1.18            -0.0        1.24        perf-profile.children.cycles-pp.mas_prev_slot
      1.58            -0.1        1.51            -0.1        1.52            +0.0        1.59        perf-profile.children.cycles-pp.mas_update_gap
      0.62            -0.1        0.56            -0.0        0.58            -0.0        0.62        perf-profile.children.cycles-pp.security_mmap_addr
      0.49 ±  2%      -0.1        0.43            -0.0        0.44 ±  2%      -0.0        0.46 ±  3%  perf-profile.children.cycles-pp.setup_object
      0.90            -0.1        0.84            -0.1        0.75            -0.1        0.78        perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.98            -0.1        0.92            -0.0        0.97            +0.0        1.02        perf-profile.children.cycles-pp.mas_pop_node
      0.85            -0.1        0.80            -0.1        0.78            -0.0        0.84        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.68            -0.1        1.62            -0.1        1.62            +0.0        1.68        perf-profile.children.cycles-pp.__get_unmapped_area
      1.23            -0.1        1.18            +0.0        1.27            +0.1        1.34        perf-profile.children.cycles-pp.__pte_offset_map_lock
      1.08            -0.1        1.03            -0.0        1.08            +0.1        1.14        perf-profile.children.cycles-pp.zap_pte_range
      0.69 ±  2%      -0.0        0.64            -0.0        0.67 ±  2%      +0.0        0.70        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.04            -0.0        1.00            -0.0        1.00            +0.0        1.08        perf-profile.children.cycles-pp.vma_to_resize
      1.08            -0.0        1.04            -0.0        1.04            +0.0        1.10        perf-profile.children.cycles-pp.mas_leaf_max_gap
      0.51 ±  3%      -0.0        0.47            -0.0        0.47            -0.0        0.51        perf-profile.children.cycles-pp.anon_vma_interval_tree_insert
      1.18            -0.0        1.14            -0.1        1.12            +0.0        1.18        perf-profile.children.cycles-pp.clear_bhb_loop
      0.57            -0.0        0.53            -0.0        0.52 ±  2%      -0.0        0.54        perf-profile.children.cycles-pp.mas_wr_end_piv
      0.43            -0.0        0.40            -0.1        0.38            -0.0        0.41 ±  3%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      1.14            -0.0        1.10            -0.0        1.09            +0.0        1.15        perf-profile.children.cycles-pp.mt_find
      0.62            -0.0        0.58            -0.0        0.58            -0.0        0.61        perf-profile.children.cycles-pp.__put_partials
      0.46 ±  7%      -0.0        0.42 ±  2%      -0.0        0.43            -0.0        0.45        perf-profile.children.cycles-pp._raw_spin_lock
      0.90            -0.0        0.87            -0.0        0.88            +0.0        0.90        perf-profile.children.cycles-pp.userfaultfd_unmap_complete
      0.46 ±  3%      -0.0        0.42 ±  3%      -0.0        0.42 ±  2%      -0.0        0.45 ±  2%  perf-profile.children.cycles-pp.__alloc_pages_noprof
      0.61            -0.0        0.58            -0.0        0.58            -0.0        0.60        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.44 ±  3%      -0.0        0.40 ±  3%      -0.0        0.40 ±  2%      -0.0        0.43 ±  2%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.48            -0.0        0.45 ±  2%      -0.0        0.45            -0.0        0.46        perf-profile.children.cycles-pp.mas_prev_range
      0.64            -0.0        0.61            -0.0        0.61            +0.0        0.65        perf-profile.children.cycles-pp.get_old_pud
      0.31 ±  2%      -0.0        0.28 ±  3%      -0.0        0.29 ±  2%      +0.0        0.32 ±  3%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.33 ±  3%      -0.0        0.30 ±  2%      -0.0        0.30 ±  2%      -0.0        0.32 ±  2%  perf-profile.children.cycles-pp.mas_put_in_tree
      0.32 ±  2%      -0.0        0.29 ±  2%      -0.0        0.30 ±  3%      -0.0        0.31 ±  2%  perf-profile.children.cycles-pp.tlb_finish_mmu
      0.47            -0.0        0.44 ±  2%      -0.0        0.42 ±  2%      -0.0        0.45        perf-profile.children.cycles-pp.rcu_segcblist_enqueue
      0.70 ±  3%      -0.0        0.68            -0.0        0.66 ±  3%      -0.1        0.60        perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
      0.32 ±  3%      -0.0        0.30 ±  2%      -0.0        0.30            -0.0        0.32        perf-profile.children.cycles-pp.free_unref_page
      0.55            -0.0        0.53            -0.0        0.55 ±  2%      +0.0        0.58        perf-profile.children.cycles-pp.refill_obj_stock
      0.33            -0.0        0.31            -0.0        0.32            +0.0        0.33        perf-profile.children.cycles-pp.mas_destroy
      0.25 ±  4%      -0.0        0.23 ±  3%      -0.0        0.23 ±  3%      -0.0        0.25 ±  2%  perf-profile.children.cycles-pp.rmqueue
      0.35            -0.0        0.34            -0.0        0.34            +0.0        0.36        perf-profile.children.cycles-pp.__rb_insert_augmented
      0.39            -0.0        0.37            -0.0        0.36 ±  2%      -0.0        0.38        perf-profile.children.cycles-pp.down_write_killable
      0.22 ±  4%      -0.0        0.20 ±  3%      -0.0        0.20 ±  3%      -0.0        0.22 ±  3%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.21 ±  4%      -0.0        0.19 ±  3%      -0.0        0.19 ±  3%      -0.0        0.20 ±  3%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.52            -0.0        0.51 ±  2%      +0.1        0.59            +0.1        0.64        perf-profile.children.cycles-pp.__pte_offset_map
      0.30 ±  2%      -0.0        0.28 ±  2%      -0.1        0.23 ±  3%      -0.0        0.25 ±  3%  perf-profile.children.cycles-pp.__vm_enough_memory
      0.26            -0.0        0.24 ±  2%      -0.0        0.21            -0.0        0.22        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.28 ±  2%      -0.0        0.27 ±  2%      -0.0        0.26            -0.0        0.28 ±  2%  perf-profile.children.cycles-pp.free_unref_page_commit
      0.29            -0.0        0.27            -0.0        0.27 ±  2%      +0.0        0.29 ±  2%  perf-profile.children.cycles-pp.tlb_gather_mmu
      0.16 ±  2%      -0.0        0.14 ±  3%      -0.0        0.14 ±  2%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.mas_wr_append
      0.28 ±  2%      -0.0        0.26            +0.0        0.32            +0.1        0.33 ±  2%  perf-profile.children.cycles-pp.khugepaged_enter_vma
      0.32            -0.0        0.30            -0.0        0.30            -0.0        0.32 ±  2%  perf-profile.children.cycles-pp.mas_wr_store_setup
      0.09 ±  4%      -0.0        0.08 ±  5%      -0.0        0.06 ±  6%      -0.0        0.07        perf-profile.children.cycles-pp.vma_dup_policy
      0.43            -0.0        0.42            -0.0        0.41            +0.0        0.43        perf-profile.children.cycles-pp.mremap_userfaultfd_complete
      0.13 ±  6%      -0.0        0.12 ± 11%      -0.0        0.10 ±  4%      +0.0        0.13 ±  9%  perf-profile.children.cycles-pp.vm_stat_account
      0.36            -0.0        0.35            -0.0        0.35            +0.0        0.37        perf-profile.children.cycles-pp.madvise_vma_behavior
      0.18 ±  2%      -0.0        0.17 ±  2%      -0.0        0.16 ±  2%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.__free_one_page
      0.16 ±  3%      -0.0        0.15 ±  3%      -0.0        0.12            -0.0        0.13 ±  3%  perf-profile.children.cycles-pp.x64_sys_call
      0.15 ±  3%      -0.0        0.14 ±  3%      -0.0        0.13 ±  2%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.flush_tlb_batched_pending
      0.15 ±  2%      -0.0        0.14 ±  3%      +0.0        0.19 ±  2%      +0.1        0.20 ±  2%  perf-profile.children.cycles-pp.mas_node_count_gfp
      0.24 ±  2%      +0.0        0.24 ±  3%      +0.0        0.24 ±  2%      +0.0        0.27 ±  6%  perf-profile.children.cycles-pp.lru_add_drain
      0.07            +0.0        0.07 ±  6%      -0.0        0.05            -0.0        0.05 ±  9%  perf-profile.children.cycles-pp.__x64_sys_mremap
      0.14 ±  3%      +0.0        0.15 ±  2%      +0.0        0.14 ±  5%      +0.0        0.14 ±  2%  perf-profile.children.cycles-pp.free_pgd_range
      0.08 ±  4%      +0.0        0.10 ±  4%      +0.0        0.08            +0.0        0.08        perf-profile.children.cycles-pp.mm_get_unmapped_area_vmflags
      0.78            +0.1        0.85            +0.1        0.84            +0.0        0.79        perf-profile.children.cycles-pp.__madvise
      0.63            +0.1        0.71            +0.1        0.70            +0.0        0.64        perf-profile.children.cycles-pp.__x64_sys_madvise
      0.63            +0.1        0.70            +0.1        0.70            +0.0        0.64        perf-profile.children.cycles-pp.do_madvise
      3.52            +0.1        3.60            +0.4        3.97            +0.5        4.03        perf-profile.children.cycles-pp.free_pgtables
      0.00            +0.1        0.09            +0.1        0.09 ±  3%      +0.0        0.00        perf-profile.children.cycles-pp.can_modify_mm_madv
      1.30            +0.2        1.46            +0.2        1.48            +0.0        1.32        perf-profile.children.cycles-pp.mas_next_slot
     88.06            +0.8       88.84            +0.9       88.91            +0.3       88.40        perf-profile.children.cycles-pp.mremap
     83.81            +1.0       84.84            +1.2       84.99            +0.5       84.28        perf-profile.children.cycles-pp.__do_sys_mremap
     85.98            +1.0       87.02            +1.1       87.07            +0.4       86.38        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     85.50            +1.1       86.56            +1.1       86.60            +0.4       85.89        perf-profile.children.cycles-pp.do_syscall_64
      2.12            +1.5        3.62            +1.5        3.61            +0.0        2.13        perf-profile.children.cycles-pp.do_munmap
     40.41            +1.5       41.93            +1.6       42.04            +0.3       40.75        perf-profile.children.cycles-pp.do_vmi_munmap
      3.62            +2.4        5.98            +2.3        5.93            +0.0        3.65        perf-profile.children.cycles-pp.mas_walk
      5.40            +3.0        8.44            +3.0        8.41            +0.1        5.47        perf-profile.children.cycles-pp.mremap_to
      5.26            +3.2        8.48            +3.2        8.44            +0.1        5.31        perf-profile.children.cycles-pp.mas_find
      0.00            +5.5        5.46            +5.4        5.42            +0.0        0.00        perf-profile.children.cycles-pp.can_modify_mm
     11.49            -0.6       10.92            -0.6       10.92            -0.1       11.41        perf-profile.self.cycles-pp.__slab_free
      4.32            -0.2        4.07            -1.1        3.26 ±  2%      -0.9        3.46        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
      1.96            -0.2        1.80 ±  4%      -0.2        1.75 ±  6%      -0.1        1.89 ±  3%  perf-profile.self.cycles-pp.__memcpy
      2.36 ±  2%      -0.1        2.24 ±  2%      -0.1        2.22 ±  3%      +0.0        2.38 ±  2%  perf-profile.self.cycles-pp.down_write
      2.42            -0.1        2.30            -0.1        2.31            +0.0        2.44        perf-profile.self.cycles-pp.rcu_cblist_dequeue
      2.33            -0.1        2.22            -0.1        2.21            -0.0        2.32        perf-profile.self.cycles-pp.mtree_load
      2.21            -0.1        2.10            -0.1        2.10            +0.0        2.22        perf-profile.self.cycles-pp.native_flush_tlb_one_user
      2.04 ±  5%      -0.1        1.95 ±  3%      -0.2        1.80 ±  3%      -0.1        1.90 ±  3%  perf-profile.self.cycles-pp.obj_cgroup_charge
      1.62            -0.1        1.54            -0.1        1.55            +0.0        1.63 ±  2%  perf-profile.self.cycles-pp.__memcg_slab_free_hook
      1.52            -0.1        1.44            -0.1        1.45            -0.0        1.50        perf-profile.self.cycles-pp.mas_wr_walk
      1.15 ±  2%      -0.1        1.07            -0.1        1.08            -0.0        1.14 ±  2%  perf-profile.self.cycles-pp.shuffle_freelist
      1.53            -0.1        1.45            -0.1        1.46            +0.0        1.53        perf-profile.self.cycles-pp.up_write
      1.44            -0.1        1.36            -0.1        1.33            -0.0        1.41        perf-profile.self.cycles-pp.__call_rcu_common
      0.70 ±  2%      -0.1        0.62            -0.1        0.64 ±  3%      -0.0        0.67 ±  2%  perf-profile.self.cycles-pp.rcu_all_qs
      1.72            -0.1        1.66            +1.0        2.68 ±  2%      +1.1        2.84        perf-profile.self.cycles-pp.mod_objcg_state
      0.51 ±  3%      -0.1        0.45            -0.0        0.47            -0.0        0.50        perf-profile.self.cycles-pp.security_mmap_addr
      2.52            -0.1        2.46            -0.2        2.36            -0.2        2.33        perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
      0.94 ±  2%      -0.1        0.88 ±  4%      -0.1        0.88 ±  3%      -0.0        0.92 ±  5%  perf-profile.self.cycles-pp.vm_area_dup
      1.18            -0.1        1.12            -0.1        1.12            -0.0        1.18        perf-profile.self.cycles-pp.vma_merge
      0.89            -0.1        0.83            -0.1        0.83            -0.0        0.88        perf-profile.self.cycles-pp.___slab_alloc
      1.38            -0.1        1.33            -0.0        1.34            +0.0        1.39        perf-profile.self.cycles-pp.do_vmi_align_munmap
      0.62            -0.1        0.56 ±  2%      -0.1        0.56            -0.0        0.59        perf-profile.self.cycles-pp.mremap
      1.00            -0.1        0.95            -0.1        0.94            -0.0        0.97        perf-profile.self.cycles-pp.mas_preallocate
      0.98            -0.1        0.93            -0.0        0.94            -0.0        0.98        perf-profile.self.cycles-pp.move_ptes
      0.99            -0.1        0.94            -0.0        0.94            -0.0        0.99        perf-profile.self.cycles-pp.mas_prev_slot
      1.09            -0.0        1.04 ±  2%      -0.0        1.07            +0.0        1.14        perf-profile.self.cycles-pp.__cond_resched
      0.94            -0.0        0.90            -0.1        0.88            -0.0        0.94        perf-profile.self.cycles-pp.vm_area_free_rcu_cb
      0.85            -0.0        0.80            -0.0        0.84            +0.0        0.88        perf-profile.self.cycles-pp.mas_pop_node
      0.77            -0.0        0.72            -0.1        0.64            -0.1        0.66        perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.68            -0.0        0.63            -0.1        0.62            -0.0        0.66        perf-profile.self.cycles-pp.__split_vma
      1.17            -0.0        1.13            -0.1        1.11            +0.0        1.17        perf-profile.self.cycles-pp.clear_bhb_loop
      0.95            -0.0        0.91            -0.0        0.91            +0.0        0.95        perf-profile.self.cycles-pp.mas_leaf_max_gap
      0.79            -0.0        0.75            -0.0        0.77            +0.0        0.80        perf-profile.self.cycles-pp.mas_wr_store_entry
      0.44            -0.0        0.40            -0.0        0.41            +0.0        0.44        perf-profile.self.cycles-pp.do_munmap
      1.22            -0.0        1.18            -0.0        1.19            +0.0        1.22        perf-profile.self.cycles-pp.move_vma
      0.45            -0.0        0.42            -0.0        0.41            -0.0        0.43        perf-profile.self.cycles-pp.mas_wr_end_piv
      0.89            -0.0        0.86            -0.0        0.87            +0.0        0.90        perf-profile.self.cycles-pp.mas_store_gfp
      0.43 ±  2%      -0.0        0.40            -0.1        0.38            -0.0        0.41 ±  3%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.78            -0.0        0.75            -0.0        0.76            +0.0        0.79        perf-profile.self.cycles-pp.userfaultfd_unmap_complete
      0.66            -0.0        0.63            -0.0        0.63            -0.0        0.66        perf-profile.self.cycles-pp.mas_store_prealloc
      1.49            -0.0        1.46            -0.0        1.45 ±  2%      +0.0        1.50        perf-profile.self.cycles-pp.kmem_cache_free
      0.60            -0.0        0.58            -0.0        0.58            +0.0        0.61        perf-profile.self.cycles-pp.unmap_region
      0.86            -0.0        0.83            -0.0        0.84            +0.0        0.88        perf-profile.self.cycles-pp.move_page_tables
      0.43 ±  4%      -0.0        0.40            -0.0        0.40            -0.0        0.42        perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
      0.99            -0.0        0.97            -0.0        0.95            +0.0        1.00        perf-profile.self.cycles-pp.mt_find
      0.71            -0.0        0.68            -0.0        0.67            -0.0        0.69        perf-profile.self.cycles-pp.unmap_page_range
      0.36 ±  3%      -0.0        0.33 ±  2%      -0.0        0.34 ±  3%      +0.0        0.36 ±  2%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.55            -0.0        0.52            -0.0        0.52            +0.0        0.55        perf-profile.self.cycles-pp.get_old_pud
      0.49            -0.0        0.47            -0.0        0.47            +0.0        0.49        perf-profile.self.cycles-pp.find_vma_prev
      0.27            -0.0        0.25            -0.0        0.25            -0.0        0.26 ±  2%  perf-profile.self.cycles-pp.mas_prev_setup
      0.41            -0.0        0.39            -0.0        0.39            +0.0        0.42        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.61            -0.0        0.58            -0.0        0.59            +0.0        0.62        perf-profile.self.cycles-pp.copy_vma
      0.37 ±  6%      -0.0        0.35 ±  2%      -0.0        0.36            -0.0        0.37        perf-profile.self.cycles-pp._raw_spin_lock
      0.47            -0.0        0.45 ±  2%      -0.0        0.46            -0.0        0.47        perf-profile.self.cycles-pp.flush_tlb_mm_range
      0.42 ±  2%      -0.0        0.40 ±  2%      -0.0        0.38 ±  2%      -0.0        0.41        perf-profile.self.cycles-pp.rcu_segcblist_enqueue
      0.27            -0.0        0.25 ±  2%      -0.0        0.24 ±  2%      -0.0        0.26 ±  2%  perf-profile.self.cycles-pp.mas_put_in_tree
      0.44            -0.0        0.42            -0.0        0.42            +0.0        0.44        perf-profile.self.cycles-pp.mas_update_gap
      0.39            -0.0        0.37            -0.0        0.38            -0.0        0.39        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.49            -0.0        0.47            +0.0        0.50 ±  2%      +0.0        0.52        perf-profile.self.cycles-pp.refill_obj_stock
      0.27 ±  2%      -0.0        0.25 ±  2%      -0.0        0.26            -0.0        0.27        perf-profile.self.cycles-pp.tlb_finish_mmu
      0.34            -0.0        0.32            -0.0        0.32            -0.0        0.33        perf-profile.self.cycles-pp.zap_pmd_range
      0.48            -0.0        0.46            -0.0        0.48            +0.0        0.49        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.58 ±  2%      -0.0        0.56            -0.0        0.54 ±  3%      -0.1        0.48        perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      0.28            -0.0        0.26            -0.0        0.27            +0.0        0.28 ±  2%  perf-profile.self.cycles-pp.mas_alloc_nodes
      0.24 ±  2%      -0.0        0.22            -0.0        0.22            +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.mas_prev
      0.14 ±  3%      -0.0        0.12 ±  2%      -0.0        0.12            -0.0        0.12        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.52            -0.0        0.51            -0.0        0.51            +0.0        0.55        perf-profile.self.cycles-pp.mremap_to
      0.26            -0.0        0.24            -0.0        0.24            -0.0        0.26        perf-profile.self.cycles-pp.__rb_insert_augmented
      0.40            -0.0        0.39            -0.0        0.39            +0.0        0.41 ±  2%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.38            -0.0        0.37            -0.0        0.36            -0.0        0.38        perf-profile.self.cycles-pp.mremap_userfaultfd_complete
      0.28            -0.0        0.26 ±  3%      -0.0        0.26            -0.0        0.27 ±  2%  perf-profile.self.cycles-pp.mas_prev_range
      0.33 ±  2%      -0.0        0.32            -0.0        0.31            -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.zap_pte_range
      0.28            -0.0        0.26            -0.0        0.27            +0.0        0.28        perf-profile.self.cycles-pp.flush_tlb_func
      0.22            -0.0        0.21 ±  2%      -0.0        0.20 ±  2%      -0.0        0.21        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.10            -0.0        0.09            -0.0        0.09 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.mod_node_page_state
      0.17            -0.0        0.16            -0.0        0.17 ±  2%      +0.0        0.17        perf-profile.self.cycles-pp.__thp_vma_allowable_orders
      0.44            -0.0        0.42 ±  2%      +0.1        0.50            +0.1        0.54        perf-profile.self.cycles-pp.__pte_offset_map
      0.06            -0.0        0.05            -0.1        0.00            -0.0        0.02 ±129%  perf-profile.self.cycles-pp.vma_dup_policy
      0.13 ±  3%      -0.0        0.12 ±  3%      -0.0        0.09            -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.x64_sys_call
      0.31            -0.0        0.30            -0.0        0.29            -0.0        0.29        perf-profile.self.cycles-pp.unmap_vmas
      0.10 ± 10%      -0.0        0.09 ± 12%      -0.0        0.08 ±  5%      +0.0        0.10 ± 12%  perf-profile.self.cycles-pp.vm_stat_account
      0.08 ±  5%      -0.0        0.07 ±  4%      +0.0        0.11 ±  3%      +0.0        0.12 ±  3%  perf-profile.self.cycles-pp.mas_node_count_gfp
      0.22            -0.0        0.21 ±  2%      -0.0        0.20            -0.0        0.21 ±  2%  perf-profile.self.cycles-pp.do_syscall_64
      0.11            -0.0        0.10 ±  4%      -0.0        0.10            +0.0        0.11        perf-profile.self.cycles-pp.security_vm_enough_memory_mm
      0.08            -0.0        0.08 ±  5%      -0.0        0.08 ±  4%      +0.0        0.09        perf-profile.self.cycles-pp.__vm_enough_memory
      0.07            +0.0        0.07            +0.0        0.08            +0.0        0.09 ±  3%  perf-profile.self.cycles-pp.khugepaged_enter_vma
      0.15 ±  3%      +0.0        0.16 ±  3%      +0.0        0.16 ±  3%      +0.0        0.17 ±  2%  perf-profile.self.cycles-pp.vma_to_resize
      0.56            +0.0        0.57            -0.0        0.53            -0.0        0.53        perf-profile.self.cycles-pp.__do_sys_mremap
      0.06 ±  5%      +0.0        0.07            +0.0        0.06            +0.0        0.06        perf-profile.self.cycles-pp.mm_get_unmapped_area_vmflags
      0.11 ±  4%      +0.0        0.12 ±  4%      -0.0        0.11 ±  3%      +0.0        0.12 ±  3%  perf-profile.self.cycles-pp.free_pgd_range
      0.21            +0.0        0.22 ±  2%      -0.0        0.21 ±  2%      +0.0        0.22 ±  2%  perf-profile.self.cycles-pp.thp_get_unmapped_area_vmflags
      0.45            +0.0        0.48            +0.0        0.48            -0.0        0.44        perf-profile.self.cycles-pp.do_vmi_munmap
      0.27            +0.0        0.32            +0.3        0.60            +0.4        0.62        perf-profile.self.cycles-pp.free_pgtables
      0.36 ±  2%      +0.1        0.44            +0.0        0.37 ±  2%      -0.0        0.35        perf-profile.self.cycles-pp.unlink_anon_vmas
      1.06            +0.1        1.19            +0.1        1.20            +0.0        1.08        perf-profile.self.cycles-pp.mas_next_slot
      1.49            +0.5        2.01            +0.5        1.98            +0.0        1.50        perf-profile.self.cycles-pp.mas_find
      0.00            +1.4        1.38            +1.4        1.38            +0.0        0.00        perf-profile.self.cycles-pp.can_modify_mm
      3.15            +2.1        5.23            +2.0        5.19            +0.0        3.16        perf-profile.self.cycles-pp.mas_walk


> 
> For everyone: Apologies if you're in the CC list and I didn't CC you,
> but I tried to keep my patch set's CC list relatively short and clean
> (and I focused on the active participants).
> Everyone's comments are very welcome.
> 
> [1]: https://lore.kernel.org/all/20240806212808.1885309-1-pedro.falcato@gmail.com/
> -- 
> Pedro

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06 14:43                 ` Linus Torvalds
@ 2024-08-07 12:26                   ` Michael Ellerman
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2024-08-07 12:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Xu, Nicholas Piggin, Christophe Leroy, Pedro Falcato,
	kernel test robot, Jeff Xu, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Kees Cook, Liam R. Howlett, Dave Hansen,
	Greg Kroah-Hartman, Guenter Roeck, Jann Horn, Jonathan Corbet,
	Jorge Lucangeli Obes, Matthew Wilcox, Muhammad Usama Anjum,
	Stephen Röttger, Suren Baghdasaryan, Amer Al Shanawany,
	Javier Carrasco, Shuah Khan, linux-api, linux-mm, ying.huang,
	feng.tang, fengwei.yin

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Tue, 6 Aug 2024 at 05:03, Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Or should I turn it into a series and post it?
>
> I think post it as a single working patch rather than as a series that
> breaks things and then fixes it.

It splits nicely with no breakage along the way.

> And considering that you did all the testing and found the problems,
> just take ownership of it and make it a "Suggested-by: Linus" or
> something.

Sure.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression
  2024-08-06  2:01                 ` Michael Ellerman
  2024-08-06  2:15                   ` Linus Torvalds
@ 2024-09-13  5:47                   ` Christophe Leroy
  1 sibling, 0 replies; 29+ messages in thread
From: Christophe Leroy @ 2024-09-13  5:47 UTC (permalink / raw)
  To: Michael Ellerman, Linus Torvalds, Nicholas Piggin
  Cc: Jeff Xu, Pedro Falcato, kernel test robot, Jeff Xu, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Kees Cook, Liam R. Howlett,
	Dave Hansen, Greg Kroah-Hartman, Guenter Roeck, Jann Horn,
	Jonathan Corbet, Jorge Lucangeli Obes, Matthew Wilcox,
	Muhammad Usama Anjum, Stephen Röttger, Suren Baghdasaryan,
	Amer Al Shanawany, Javier Carrasco, Shuah Khan, linux-api,
	linux-mm, ying.huang, feng.tang, fengwei.yin



Le 06/08/2024 à 04:01, Michael Ellerman a écrit :
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>> On Mon, 5 Aug 2024 at 16:25, Nicholas Piggin <npiggin@gmail.com> wrote:
>>>
>>> Can userspace on other archs not unmap their vdsos?
>>
>> I think they can, and nobody cares. The "context.vdso" value stays at
>> some stale value, and anybody who tries to use it will just fail.
>>
>> So what makes powerpc special is not "you can unmap the vdso", but
>> "powerpc cares".
>>
>> I just don't quite know _why_ powerpc cares.
> 
> AFAIK for CRIU the problem is signal delivery:
> 
> arch/powerpc/kernel/signal_64.c:
> 
> int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
> 		struct task_struct *tsk)
> {
>          ...
> 	/* Set up to return from userspace. */
> 	if (tsk->mm->context.vdso) {
> 		regs_set_return_ip(regs, VDSO64_SYMBOL(tsk->mm->context.vdso, sigtramp_rt64));
> 
> 
> ie. if the VDSO is moved but mm->context.vdso is not updated, signal
> delivery will crash in userspace.
> 
> x86-64 always uses SA_RESTORER, and arm64 & s390 can use SA_RESTORER, so
> I think CRIU uses that to avoid problems with signal delivery when the
> VDSO is moved.
> 
> riscv doesn't support SA_RESTORER but I guess CRIU doesn't support riscv
> yet so it's not become a problem.
> 
> There was a patch to support SA_RESTORER on powerpc, but I balked at
> merging it because I couldn't find anyone on the glibc side to say
> whether they wanted it or not. I guess I should have just merged it.

The patch is at 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/afe50d1db63a10fde9547ea08fe1fa68b0638aba.1624618157.git.christophe.leroy@csgroup.eu/

It still applies cleanly.

Christophe


> 
> There was an attempt to unify all the vdso stuff and handle the
> VDSO mremap case in generic code:
> 
>    https://lore.kernel.org/lkml/20210611180242.711399-17-dima@arista.com/
> 
> But I think that series got a bit big and complicated and Dmitry had to
> move on to other things.
> 
> cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-09-13  5:47 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-04  8:59 [linus:master] [mseal] 8be7258aad: stress-ng.pagemove.page_remaps_per_sec -4.4% regression kernel test robot
2024-08-04 20:32 ` Linus Torvalds
2024-08-05 13:33   ` Pedro Falcato
2024-08-05 18:10     ` Jeff Xu
2024-08-05 18:55       ` Linus Torvalds
2024-08-05 19:33         ` Linus Torvalds
2024-08-06  2:14           ` Michael Ellerman
2024-08-06  2:17             ` Linus Torvalds
2024-08-06 12:03               ` Michael Ellerman
2024-08-06 14:43                 ` Linus Torvalds
2024-08-07 12:26                   ` Michael Ellerman
2024-08-06  6:04           ` Oliver Sang
2024-08-06 14:38             ` Linus Torvalds
2024-08-06 21:37             ` Pedro Falcato
2024-08-07  5:54               ` Oliver Sang
2024-08-05 19:37         ` Jeff Xu
2024-08-05 19:48           ` Linus Torvalds
2024-08-05 19:50             ` Linus Torvalds
2024-08-05 23:24             ` Nicholas Piggin
2024-08-06  0:13               ` Linus Torvalds
2024-08-06  1:22                 ` Jeff Xu
2024-08-06  2:01                 ` Michael Ellerman
2024-08-06  2:15                   ` Linus Torvalds
2024-09-13  5:47                   ` Christophe Leroy
2024-08-05 17:54   ` Jeff Xu
2024-08-05 13:56 ` Jeff Xu
2024-08-05 16:58 ` Jeff Xu
2024-08-06  1:44   ` Oliver Sang
2024-08-06 14:54     ` Jeff Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).