* [robertosassu:evm-iint-ptr-v1-devel-v3] [evm] e38e699a42: will-it-scale.per_process_ops 160.4% improvement
@ 2025-02-17 6:45 kernel test robot
2025-02-17 7:58 ` Mateusz Guzik
0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2025-02-17 6:45 UTC (permalink / raw)
To: Roberto Sassu; +Cc: oe-lkp, lkp, linux-integrity, oliver.sang
Hello,
kernel test robot noticed a 160.4% improvement of will-it-scale.per_process_ops on:
commit: e38e699a42b4db5daf7dac453759fdc8ba0dab31 ("evm: Move metadata in the inode security blob to a pointer")
https://github.com/robertosassu/linux evm-iint-ptr-v1-devel-v3
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
parameters:
nr_task: 100%
mode: process
test: open3
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250217/202502171412.ec2e5b88-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/open3/will-it-scale
commit:
d7a782797d ("ima: Reset IMA_NONACTION_RULE_FLAGS after post_setattr")
e38e699a42 ("evm: Move metadata in the inode security blob to a pointer")
d7a782797df7c64e e38e699a42b4db5daf7dac45375
---------------- ---------------------------
%stddev %change %stddev
\ | \
194348 ± 2% +42.5% 276904 ± 9% cpuidle..usage
2032 +69.8% 3451 ± 8% vmstat.system.cs
219050 ± 22% -53.4% 101970 ± 46% meminfo.AnonHugePages
224488 ± 6% -14.1% 192753 ± 6% meminfo.Mapped
1151621 -10.9% 1026208 meminfo.Shmem
0.62 ± 3% +0.1 0.70 ± 2% mpstat.cpu.all.idle%
0.16 ± 4% +0.0 0.20 ± 2% mpstat.cpu.all.usr%
3.00 +1650.0% 52.50 mpstat.max_utilization.seconds
1221925 ± 7% -16.7% 1018311 ± 12% numa-meminfo.node1.FilePages
123310 ± 9% -25.6% 91721 ± 12% numa-meminfo.node1.Mapped
1085554 -18.7% 882893 ± 12% numa-meminfo.node1.Shmem
305498 ± 7% -16.7% 254488 ± 12% numa-vmstat.node1.nr_file_pages
31041 ± 9% -25.1% 23250 ± 12% numa-vmstat.node1.nr_mapped
271405 -18.7% 220634 ± 12% numa-vmstat.node1.nr_shmem
620152 ± 2% +160.4% 1614732 ± 2% will-it-scale.224.processes
0.13 +46.2% 0.19 ± 4% will-it-scale.224.processes_idle
2768 ± 2% +160.4% 7208 ± 2% will-it-scale.per_process_ops
620152 ± 2% +160.4% 1614732 ± 2% will-it-scale.workload
32.33 ± 13% +1535.6% 528.83 ± 12% perf-c2c.DRAM.local
3917 ± 6% +54.6% 6054 ± 5% perf-c2c.DRAM.remote
5456 ± 7% +25.3% 6835 ± 7% perf-c2c.HITM.local
3562 ± 7% +22.8% 4376 ± 5% perf-c2c.HITM.remote
9019 ± 7% +24.3% 11212 ± 6% perf-c2c.HITM.total
487248 -6.8% 454278 proc-vmstat.nr_active_anon
107.00 ± 22% -53.4% 49.84 ± 46% proc-vmstat.nr_anon_transparent_hugepages
1164233 -2.7% 1133175 proc-vmstat.nr_file_pages
56483 ± 5% -14.1% 48516 ± 6% proc-vmstat.nr_mapped
287834 -10.8% 256778 proc-vmstat.nr_shmem
487248 -6.8% 454278 proc-vmstat.nr_zone_active_anon
25013 ± 34% +124.4% 56137 ± 5% proc-vmstat.numa_hint_faults
11595 ± 47% +102.3% 23462 ± 38% proc-vmstat.numa_hint_faults_local
1728407 -1.8% 1697091 proc-vmstat.numa_hit
1493274 -1.7% 1467674 proc-vmstat.numa_local
179326 ± 15% +70.1% 305113 ± 2% proc-vmstat.numa_pte_updates
1881688 -4.2% 1802948 proc-vmstat.pgalloc_normal
1350178 +3.0% 1390538 proc-vmstat.pgfault
64622 ± 17% +29.0% 83367 ± 6% proc-vmstat.pgreuse
46897964 ± 2% +23.2% 57779828 ± 5% sched_debug.cfs_rq:/.avg_vruntime.max
30929943 +10.8% 34256347 sched_debug.cfs_rq:/.avg_vruntime.min
1297707 ± 7% +112.5% 2758168 ± 11% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.75 ± 11% -29.6% 0.53 ± 11% sched_debug.cfs_rq:/.h_nr_running.min
3452 ± 10% -29.5% 2433 ± 11% sched_debug.cfs_rq:/.load.min
3.00 ± 11% -29.6% 2.11 ± 10% sched_debug.cfs_rq:/.load_avg.min
46897964 ± 2% +23.2% 57779828 ± 5% sched_debug.cfs_rq:/.min_vruntime.max
30929943 +10.8% 34256347 sched_debug.cfs_rq:/.min_vruntime.min
1297707 ± 7% +112.5% 2758169 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev
0.75 ± 11% -29.6% 0.53 ± 11% sched_debug.cfs_rq:/.nr_running.min
761.36 ± 10% -27.3% 553.17 ± 7% sched_debug.cfs_rq:/.runnable_avg.min
673.31 ± 12% -24.7% 507.00 ± 9% sched_debug.cfs_rq:/.util_avg.min
407.66 +107.2% 844.62 sched_debug.cfs_rq:/.util_est.avg
755.11 ± 6% +98.5% 1498 ± 8% sched_debug.cfs_rq:/.util_est.max
7.03 ±147% +6359.7% 453.97 ± 21% sched_debug.cfs_rq:/.util_est.min
972664 -69.0% 301696 ± 18% sched_debug.cpu.avg_idle.avg
1514330 ± 4% -51.5% 734068 ± 11% sched_debug.cpu.avg_idle.max
161784 ± 3% -97.2% 4568 ± 49% sched_debug.cpu.avg_idle.min
26.03 ± 4% -13.2% 22.60 ± 8% sched_debug.cpu.clock.stddev
5158 ± 7% -25.0% 3867 ± 18% sched_debug.cpu.curr->pid.min
2243 +41.7% 3179 ± 4% sched_debug.cpu.nr_switches.avg
783.47 ± 4% +45.8% 1142 ± 2% sched_debug.cpu.nr_switches.min
0.18 -14.7% 0.16 ± 2% perf-stat.i.MPKI
1.286e+10 +143.1% 3.126e+10 perf-stat.i.branch-instructions
0.11 -0.0 0.06 perf-stat.i.branch-miss-rate%
12831155 +25.5% 16098495 perf-stat.i.branch-misses
30.76 ± 2% +3.9 34.69 ± 2% perf-stat.i.cache-miss-rate%
9505547 +150.7% 23827336 ± 2% perf-stat.i.cache-misses
32785522 ± 2% +112.8% 69779552 perf-stat.i.cache-references
1988 +71.3% 3406 ± 8% perf-stat.i.context-switches
12.29 -66.5% 4.12 perf-stat.i.cpi
266.62 +7.4% 286.36 perf-stat.i.cpu-migrations
69838 -60.9% 27322 ± 2% perf-stat.i.cycles-between-cache-misses
5.257e+10 +198.1% 1.567e+11 perf-stat.i.instructions
0.09 +188.2% 0.25 perf-stat.i.ipc
0.18 -15.8% 0.15 ± 2% perf-stat.overall.MPKI
0.10 -0.0 0.05 perf-stat.overall.branch-miss-rate%
28.59 +5.5 34.11 ± 2% perf-stat.overall.cache-miss-rate%
12.31 -66.5% 4.13 perf-stat.overall.cpi
68150 -60.2% 27152 ± 2% perf-stat.overall.cycles-between-cache-misses
0.08 +198.3% 0.24 perf-stat.overall.ipc
25547531 ± 2% +14.4% 29237729 perf-stat.overall.path-length
1.282e+10 +143.1% 3.116e+10 perf-stat.ps.branch-instructions
12599626 +27.1% 16009955 perf-stat.ps.branch-misses
9468988 +151.0% 23763879 ± 2% perf-stat.ps.cache-misses
33116170 +110.4% 69681456 perf-stat.ps.cache-references
1970 +72.2% 3393 ± 8% perf-stat.ps.context-switches
262.57 +8.5% 284.83 perf-stat.ps.cpu-migrations
5.239e+10 +198.1% 1.562e+11 perf-stat.ps.instructions
4015 +2.3% 4106 perf-stat.ps.minor-faults
4015 +2.3% 4106 perf-stat.ps.page-faults
1.583e+13 +198.0% 4.719e+13 perf-stat.total.instructions
3.48 ± 7% -88.4% 0.40 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
3.63 ± 11% -94.0% 0.22 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
0.13 ± 3% -74.7% 0.03 ± 7% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.02 ±108% +1600.0% 0.28 ± 35% perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.77 ± 4% -45.9% 0.42 ± 18% perf-sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
3.65 ± 4% -93.0% 0.25 ±222% perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.09 ± 47% -97.4% 0.00 ±143% perf-sched.sch_delay.avg.ms.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
3.50 ± 16% -92.6% 0.26 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.05 ± 10% -74.1% 0.01 ± 85% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.87 ± 11% -26.2% 0.64 ± 16% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.61 ±103% -99.3% 0.00 ± 14% perf-sched.sch_delay.avg.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
0.07 ± 16% -89.8% 0.01 ± 11% perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
3.83 -27.8% 2.76 ± 9% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
3.40 ± 19% -90.7% 0.32 ± 38% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
0.12 ± 35% -67.6% 0.04 ± 42% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.53 ± 16% -77.4% 0.12 ± 64% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.14 ± 37% +97.9% 0.28 ± 10% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.06 ± 35% -88.2% 0.01 ± 70% perf-sched.sch_delay.avg.ms.schedule_timeout.msleep.ast_astdp_connector_helper_detect_ctx.drm_helper_probe_detect_ctx
0.07 ± 4% -49.5% 0.04 ± 50% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.02 ± 4% -65.4% 0.01 ± 23% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.74 ± 5% -65.9% 0.25 ± 67% perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.23 ± 45% -83.4% 0.04 ± 96% perf-sched.sch_delay.avg.ms.usleep_range_state.ipmi_thread.kthread.ret_from_fork
4.13 -83.9% 0.67 ±223% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
4.14 -84.0% 0.66 ±223% perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
0.03 ±156% +21902.3% 6.42 ± 77% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
18.15 ± 42% -74.4% 4.64 ± 30% perf-sched.sch_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
4.15 -83.9% 0.67 ±223% perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.10 ± 39% -97.6% 0.00 ±143% perf-sched.sch_delay.max.ms.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
4.09 -83.7% 0.67 ±223% perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
3.28 +13.5% 3.72 ± 6% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
1.05 ±119% -99.6% 0.00 ± 8% perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
0.16 ± 9% -92.7% 0.01 ± 21% perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
13.59 ± 16% -64.6% 4.81 ± 41% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
3.98 ± 4% -80.3% 0.78 ± 42% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.06 ± 81% -92.7% 0.00 ± 83% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all
0.09 ± 50% -88.3% 0.01 ±101% perf-sched.sch_delay.max.ms.schedule_timeout.msleep.ast_astdp_connector_helper_detect_ctx.drm_helper_probe_detect_ctx
0.20 ± 9% +2615.7% 5.51 ± 60% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
2.40 ± 36% +53.4% 3.68 ± 8% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
144.41 ± 3% -31.7% 98.62 ± 7% perf-sched.total_wait_and_delay.average.ms
10694 ± 2% +33.0% 14221 ± 5% perf-sched.total_wait_and_delay.count.ms
143.83 ± 3% -31.8% 98.07 ± 7% perf-sched.total_wait_time.average.ms
3.50 ± 5% +85.0% 6.47 ± 12% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
411.43 ± 20% -58.3% 171.38 ± 70% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.87 ± 11% -24.7% 0.66 ± 19% perf-sched.wait_and_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
683.93 ± 24% -22.0% 533.76 ± 35% perf-sched.wait_and_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
1.90 ± 6% -17.9% 1.56 ± 13% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.66 -27.8% 5.53 ± 9% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
188.08 ± 32% -57.5% 79.88 ± 23% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
1.39 ± 16% +20.1% 1.67 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
2265 -96.8% 72.33 ± 46% perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
123.00 -24.0% 93.50 ± 9% perf-sched.wait_and_delay.count.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
102.00 +9.5% 111.67 ± 2% perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
409.00 ± 25% +135.5% 963.33 ± 27% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
2450 ± 5% -81.0% 466.00 ± 76% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
1003 +348.9% 4506 ± 23% perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
3.28 +31.6% 4.31 ± 33% perf-sched.wait_and_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
143.48 ±139% -99.7% 0.40 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
116.07 ±216% -99.8% 0.22 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
3.37 ± 5% +91.1% 6.44 ± 12% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.02 ±108% +3587.0% 0.61 ± 34% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
361.12 ±160% -99.9% 0.26 ±223% perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
411.38 ± 20% -54.1% 188.90 ± 51% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
683.63 ± 24% -22.0% 533.55 ± 35% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.79 ± 76% -74.2% 0.20 perf-sched.wait_time.avg.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
3.83 -27.8% 2.76 ± 9% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
3.40 ± 19% -90.2% 0.33 ± 43% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
187.96 ± 32% -57.5% 79.84 ± 23% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.02 ± 48% -91.7% 0.00 ±223% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all
0.35 ± 30% -37.1% 0.22 ± 18% perf-sched.wait_time.avg.ms.usleep_range_state.ipmi_thread.kthread.ret_from_fork
1386 ±141% -100.0% 0.67 ±223% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
790.56 ±222% -99.9% 0.66 ±223% perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
1002 +349.5% 4505 ± 23% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.03 ±156% +48113.7% 14.06 ± 79% perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
1464 ±141% -100.0% 0.67 ±223% perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
1.21 ± 98% -83.2% 0.20 perf-sched.wait_time.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
0.05 ± 47% -96.2% 0.00 ±223% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all
2.40 ± 36% +53.4% 3.68 ± 8% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
75.05 -72.6 2.42 ± 10% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
75.05 -72.6 2.43 ± 10% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
75.14 -72.6 2.57 ± 9% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
75.14 -72.6 2.58 ± 9% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
75.16 -72.6 2.61 ± 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
75.16 -72.5 2.62 ± 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
75.18 -72.5 2.66 ± 9% perf-profile.calltrace.cycles-pp.open64
50.23 -48.6 1.66 ± 12% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
25.20 -25.2 0.00 perf-profile.calltrace.cycles-pp.lockref_get.do_dentry_open.vfs_open.do_open.path_openat
25.16 -25.2 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.lockref_get.do_dentry_open.vfs_open.do_open
25.16 -25.2 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.lockref_get.do_dentry_open.vfs_open
24.88 -24.9 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk
24.87 -24.9 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.lockref_get_not_dead.__legitimize_path.try_to_unlazy
24.73 -24.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.dput.terminate_walk.path_openat.do_filp_open
24.72 -24.7 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dput.terminate_walk.path_openat
24.92 -24.5 0.46 ± 45% perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk.do_open
24.46 -24.5 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.dput.__fput.__x64_sys_close.do_syscall_64
24.92 -24.5 0.46 ± 45% perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.complete_walk.do_open.path_openat
24.93 -24.5 0.47 ± 45% perf-profile.calltrace.cycles-pp.try_to_unlazy.complete_walk.do_open.path_openat.do_filp_open
24.93 -24.5 0.47 ± 45% perf-profile.calltrace.cycles-pp.complete_walk.do_open.path_openat.do_filp_open.do_sys_openat2
24.45 -24.5 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dput.__fput.__x64_sys_close
25.28 -24.2 1.05 ± 17% perf-profile.calltrace.cycles-pp.do_dentry_open.vfs_open.do_open.path_openat.do_filp_open
25.28 -24.2 1.07 ± 17% perf-profile.calltrace.cycles-pp.vfs_open.do_open.path_openat.do_filp_open.do_sys_openat2
24.76 -24.2 0.60 ± 6% perf-profile.calltrace.cycles-pp.dput.terminate_walk.path_openat.do_filp_open.do_sys_openat2
24.76 -24.1 0.61 ± 6% perf-profile.calltrace.cycles-pp.terminate_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
24.50 -23.9 0.57 ± 7% perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.5 0.54 ± 6% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.osq_lock.__mutex_lock
0.00 +0.6 0.55 ± 5% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.osq_lock.__mutex_lock.evm_file_release
0.00 +0.6 0.56 ± 7% perf-profile.calltrace.cycles-pp.lockref_put_return.dput.__fput.__x64_sys_close.do_syscall_64
0.00 +0.6 0.58 ± 5% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.osq_lock.__mutex_lock.evm_file_release.security_file_release
0.00 +0.6 0.60 ± 7% perf-profile.calltrace.cycles-pp.lockref_put_return.dput.terminate_walk.path_openat.do_filp_open
24.57 +72.2 96.75 perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
24.63 +72.4 97.02 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
24.64 +72.4 97.09 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
24.65 +72.5 97.10 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
24.66 +72.5 97.15 perf-profile.calltrace.cycles-pp.__close
0.00 +95.1 95.15 perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.evm_file_release.security_file_release.__fput
0.00 +95.5 95.52 perf-profile.calltrace.cycles-pp.__mutex_lock.evm_file_release.security_file_release.__fput.__x64_sys_close
0.00 +95.7 95.71 perf-profile.calltrace.cycles-pp.evm_file_release.security_file_release.__fput.__x64_sys_close.do_syscall_64
0.00 +95.9 95.87 perf-profile.calltrace.cycles-pp.security_file_release.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.26 -99.3 0.00 perf-profile.children.cycles-pp._raw_spin_lock
99.23 -99.2 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
75.05 -72.6 2.42 ± 10% perf-profile.children.cycles-pp.path_openat
75.05 -72.6 2.43 ± 10% perf-profile.children.cycles-pp.do_filp_open
75.14 -72.6 2.58 ± 9% perf-profile.children.cycles-pp.do_sys_openat2
75.14 -72.6 2.58 ± 9% perf-profile.children.cycles-pp.__x64_sys_openat
75.18 -72.5 2.68 ± 9% perf-profile.children.cycles-pp.open64
50.23 -48.6 1.66 ± 12% perf-profile.children.cycles-pp.do_open
49.26 -48.1 1.17 ± 6% perf-profile.children.cycles-pp.dput
25.20 -24.8 0.44 ± 6% perf-profile.children.cycles-pp.lockref_get
24.92 -24.4 0.54 ± 7% perf-profile.children.cycles-pp.lockref_get_not_dead
24.93 -24.4 0.54 ± 7% perf-profile.children.cycles-pp.__legitimize_path
24.93 -24.4 0.56 ± 7% perf-profile.children.cycles-pp.complete_walk
24.93 -24.4 0.55 ± 7% perf-profile.children.cycles-pp.try_to_unlazy
25.28 -24.2 1.06 ± 17% perf-profile.children.cycles-pp.do_dentry_open
25.29 -24.2 1.07 ± 17% perf-profile.children.cycles-pp.vfs_open
24.76 -24.1 0.61 ± 6% perf-profile.children.cycles-pp.terminate_walk
99.88 -0.1 99.80 perf-profile.children.cycles-pp.do_syscall_64
99.88 -0.1 99.81 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.15 ± 4% +0.0 0.17 ± 2% perf-profile.children.cycles-pp.task_tick_fair
0.24 ± 4% +0.0 0.26 perf-profile.children.cycles-pp.sched_tick
0.38 ± 5% +0.0 0.42 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.38 ± 5% +0.0 0.42 ± 4% perf-profile.children.cycles-pp.tick_nohz_handler
0.35 ± 4% +0.0 0.39 ± 3% perf-profile.children.cycles-pp.update_process_times
0.00 +0.1 0.05 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.00 +0.1 0.05 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.alloc_empty_file
0.00 +0.1 0.05 ± 13% perf-profile.children.cycles-pp.mutex_spin_on_owner
0.51 ± 6% +0.1 0.56 ± 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.00 +0.1 0.06 ± 16% perf-profile.children.cycles-pp.evm_iint_find
0.00 +0.1 0.06 perf-profile.children.cycles-pp.link_path_walk
0.00 +0.1 0.07 ± 5% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.00 +0.1 0.09 perf-profile.children.cycles-pp.getname_flags
0.00 +0.1 0.11 ± 8% perf-profile.children.cycles-pp.osq_unlock
0.00 +0.1 0.12 ± 4% perf-profile.children.cycles-pp.mutex_unlock
0.00 +0.1 0.13 ± 14% perf-profile.children.cycles-pp.mutex_lock
0.05 +1.1 1.16 ± 6% perf-profile.children.cycles-pp.lockref_put_return
24.57 +72.2 96.76 perf-profile.children.cycles-pp.__fput
24.63 +72.4 97.02 perf-profile.children.cycles-pp.__x64_sys_close
24.67 +72.5 97.17 perf-profile.children.cycles-pp.__close
0.00 +95.2 95.17 perf-profile.children.cycles-pp.osq_lock
0.00 +95.5 95.53 perf-profile.children.cycles-pp.__mutex_lock
0.00 +95.7 95.71 perf-profile.children.cycles-pp.evm_file_release
0.00 +95.9 95.87 perf-profile.children.cycles-pp.security_file_release
98.69 -98.7 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.00 +0.1 0.05 ± 7% perf-profile.self.cycles-pp.mutex_spin_on_owner
0.00 +0.1 0.05 ± 13% perf-profile.self.cycles-pp.evm_iint_find
0.00 +0.1 0.11 ± 8% perf-profile.self.cycles-pp.osq_unlock
0.00 +0.1 0.11 ± 4% perf-profile.self.cycles-pp.mutex_unlock
0.00 +0.1 0.12 ± 13% perf-profile.self.cycles-pp.mutex_lock
0.06 +0.2 0.22 ± 23% perf-profile.self.cycles-pp.__fput
0.00 +0.2 0.19 ± 7% perf-profile.self.cycles-pp.__mutex_lock
0.00 +0.4 0.43 ± 6% perf-profile.self.cycles-pp.lockref_get
0.07 ± 5% +0.5 0.59 ± 29% perf-profile.self.cycles-pp.do_dentry_open
0.00 +0.5 0.54 ± 7% perf-profile.self.cycles-pp.lockref_get_not_dead
0.05 +1.1 1.16 ± 7% perf-profile.self.cycles-pp.lockref_put_return
0.00 +94.6 94.60 perf-profile.self.cycles-pp.osq_lock
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [robertosassu:evm-iint-ptr-v1-devel-v3] [evm] e38e699a42: will-it-scale.per_process_ops 160.4% improvement
2025-02-17 6:45 [robertosassu:evm-iint-ptr-v1-devel-v3] [evm] e38e699a42: will-it-scale.per_process_ops 160.4% improvement kernel test robot
@ 2025-02-17 7:58 ` Mateusz Guzik
2025-02-17 9:58 ` Roberto Sassu
0 siblings, 1 reply; 3+ messages in thread
From: Mateusz Guzik @ 2025-02-17 7:58 UTC (permalink / raw)
To: Roberto Sassu; +Cc: kernel test robot, oe-lkp, lkp, linux-integrity
On Mon, Feb 17, 2025 at 02:45:23PM +0800, kernel test robot wrote:
> kernel test robot noticed a 160.4% improvement of will-it-scale.per_process_ops on:
>
>
> commit: e38e699a42b4db5daf7dac453759fdc8ba0dab31 ("evm: Move metadata in the inode security blob to a pointer")
> https://github.com/robertosassu/linux evm-iint-ptr-v1-devel-v3
>
> 24.57 +72.2 96.76 perf-profile.children.cycles-pp.__fput
> 24.63 +72.4 97.02 perf-profile.children.cycles-pp.__x64_sys_close
> 24.67 +72.5 97.17 perf-profile.children.cycles-pp.__close
> 0.00 +95.2 95.17 perf-profile.children.cycles-pp.osq_lock
> 0.00 +95.5 95.53 perf-profile.children.cycles-pp.__mutex_lock
> 0.00 +95.7 95.71 perf-profile.children.cycles-pp.evm_file_release
> 0.00 +95.9 95.87 perf-profile.children.cycles-pp.security_file_release
> 98.69 -98.7 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Contrary to what's indicated in the report, this change is in fact a
significant slowdown (or rather, will be when other problems get fixed).
The open3 microbenchmark issues open + close in a loop on the same file.
On the stock kernel the some of the problem is false-sharing within
struct inode.
The biggest bottleneck is lockref manipulation:
- there is lockref acquire and release happening *twice* instead of just
once
- the lockref facility is prone to degrading to operation under a
spinlock and staying there when microbenchmarked. you can see on the
profile this does happen here
evm also used to pop up, which I patched away in 699ae6241920b0fa ("evm:
stop avoidably reading i_writecount in evm_file_release")
Your patch adds a mutex which adds 2 atomics to the fast path (so slows
down single-threaded operation) and more importantly adds a
serialization point for multithreaded operation.
In this case the resulting contention helps decrease the loss of
performance in lockref and that's how there is an apparent win.
I have a WIP patch to move dentries away from using lockref, which will
in turn avoid the degradation. Should it land, the mutex added here will
be the new bottleneck.
It needs to be avoided by default. Do you *need* to test the condition
in evm_file_release() with the lock held? Perhaps initial test can be
done without and redone after acquiring it?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [robertosassu:evm-iint-ptr-v1-devel-v3] [evm] e38e699a42: will-it-scale.per_process_ops 160.4% improvement
2025-02-17 7:58 ` Mateusz Guzik
@ 2025-02-17 9:58 ` Roberto Sassu
0 siblings, 0 replies; 3+ messages in thread
From: Roberto Sassu @ 2025-02-17 9:58 UTC (permalink / raw)
To: Mateusz Guzik, Roberto Sassu
Cc: kernel test robot, oe-lkp, lkp, linux-integrity
On Mon, 2025-02-17 at 08:58 +0100, Mateusz Guzik wrote:
> On Mon, Feb 17, 2025 at 02:45:23PM +0800, kernel test robot wrote:
> > kernel test robot noticed a 160.4% improvement of will-it-scale.per_process_ops on:
> >
> >
> > commit: e38e699a42b4db5daf7dac453759fdc8ba0dab31 ("evm: Move metadata in the inode security blob to a pointer")
> > https://github.com/robertosassu/linux evm-iint-ptr-v1-devel-v3
> >
> > 24.57 +72.2 96.76 perf-profile.children.cycles-pp.__fput
> > 24.63 +72.4 97.02 perf-profile.children.cycles-pp.__x64_sys_close
> > 24.67 +72.5 97.17 perf-profile.children.cycles-pp.__close
> > 0.00 +95.2 95.17 perf-profile.children.cycles-pp.osq_lock
> > 0.00 +95.5 95.53 perf-profile.children.cycles-pp.__mutex_lock
> > 0.00 +95.7 95.71 perf-profile.children.cycles-pp.evm_file_release
> > 0.00 +95.9 95.87 perf-profile.children.cycles-pp.security_file_release
> > 98.69 -98.7 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
> Contrary to what's indicated in the report, this change is in fact a
> significant slowdown (or rather, will be when other problems get fixed).
>
> The open3 microbenchmark issues open + close in a loop on the same file.
>
> On the stock kernel the some of the problem is false-sharing within
> struct inode.
>
> The biggest bottleneck is lockref manipulation:
> - there is lockref acquire and release happening *twice* instead of just
> once
> - the lockref facility is prone to degrading to operation under a
> spinlock and staying there when microbenchmarked. you can see on the
> profile this does happen here
>
> evm also used to pop up, which I patched away in 699ae6241920b0fa ("evm:
> stop avoidably reading i_writecount in evm_file_release")
>
> Your patch adds a mutex which adds 2 atomics to the fast path (so slows
> down single-threaded operation) and more importantly adds a
> serialization point for multithreaded operation.
>
> In this case the resulting contention helps decrease the loss of
> performance in lockref and that's how there is an apparent win.
Hi Mateusz
thanks for the explanation!
> I have a WIP patch to move dentries away from using lockref, which will
> in turn avoid the degradation. Should it land, the mutex added here will
> be the new bottleneck.
>
> It needs to be avoided by default. Do you *need* to test the condition
> in evm_file_release() with the lock held? Perhaps initial test can be
> done without and redone after acquiring it?
This patch was more an explorative work to see what challenges we
encounter to move away from embedding the structure in the inode
security blob, and just use a pointer.
Currently, there is no gain in switching, since the requested blob size
remains 40 bytes (due to adding the mutex).
Certainly it is possible to do a test without a mutex and then redoing
it. If the EVM_NEW_FILE flag is not set, we can avoid to take the lock.
Thanks
Roberto
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-02-17 9:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 6:45 [robertosassu:evm-iint-ptr-v1-devel-v3] [evm] e38e699a42: will-it-scale.per_process_ops 160.4% improvement kernel test robot
2025-02-17 7:58 ` Mateusz Guzik
2025-02-17 9:58 ` Roberto Sassu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox