linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linux-next:master] [mm/slub]  5886fc82b6: will-it-scale.per_process_ops -3.7% regression
@ 2023-10-20 14:21 kernel test robot
  2023-10-27  7:40 ` Feng Tang
  0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2023-10-20 14:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: oe-lkp, lkp, Linux Memory Management List, Feng Tang, Jay Patel,
	ying.huang, fengwei.yin, oliver.sang



Hello,

kernel test robot noticed a -3.7% regression of will-it-scale.per_process_ops on:


commit: 5886fc82b6e3166dd1ba876809888fc39028d626 ("mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: will-it-scale
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

	nr_task: 50%
	mode: process
	test: poll2
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202310202221.fdbcbe56-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310202221.fdbcbe56-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale

commit: 
  0fe2735d5e ("mm/slub: remove min_objects loop from calculate_order()")
  5886fc82b6 ("mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()")

0fe2735d5e2e0060 5886fc82b6e3166dd1ba8768098 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     28.08            +1.1%      28.40        boot-time.dhcp
      6.17 ± 10%     -15.4%       5.22 ± 10%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      6.17 ± 10%     -15.4%       5.22 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
  98376568            -3.7%   94713387        will-it-scale.112.processes
    878361            -3.7%     845654        will-it-scale.per_process_ops
  98376568            -3.7%   94713387        will-it-scale.workload
     81444            +4.8%      85370        proc-vmstat.nr_active_anon
     85071            +4.8%      89137        proc-vmstat.nr_shmem
     81444            +4.8%      85370        proc-vmstat.nr_zone_active_anon
     79205            +3.8%      82205        proc-vmstat.pgactivate
      5.18            -0.4        4.79 ±  2%  perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
      2.18            -0.2        2.03 ±  2%  perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.29            -0.1        2.19        perf-profile.calltrace.cycles-pp.__entry_text_start.__poll
      0.83            -0.1        0.76 ±  3%  perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
      0.90            -0.1        0.84 ±  2%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
      0.66 ±  2%      -0.1        0.61 ±  2%  perf-profile.calltrace.cycles-pp.__virt_addr_valid.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll
      0.66            -0.0        0.61 ±  3%  perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     47.75            +1.3       49.07        perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22.63            +2.1       24.74        perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
      5.17            -0.4        4.78 ±  2%  perf-profile.children.cycles-pp.__fdget
      2.35            -0.2        2.18 ±  2%  perf-profile.children.cycles-pp.__check_object_size
      0.84            -0.1        0.77 ±  3%  perf-profile.children.cycles-pp.__check_heap_object
      1.48            -0.1        1.41        perf-profile.children.cycles-pp.__entry_text_start
      0.94            -0.1        0.87 ±  2%  perf-profile.children.cycles-pp.check_heap_object
      1.57            -0.1        1.51 ±  2%  perf-profile.children.cycles-pp.__kmalloc
      0.68 ±  2%      -0.1        0.63        perf-profile.children.cycles-pp.__virt_addr_valid
      0.66            -0.0        0.61 ±  3%  perf-profile.children.cycles-pp.kfree
      0.83            -0.0        0.79 ±  2%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
     22.29            +1.7       24.01        perf-profile.children.cycles-pp.__fget_light
     48.12            +1.7       49.84        perf-profile.children.cycles-pp.do_poll
      7.66            -0.4        7.22        perf-profile.self.cycles-pp.do_sys_poll
      2.58 ±  2%      -0.2        2.38 ±  2%  perf-profile.self.cycles-pp.__fdget
      2.23            -0.1        2.12 ±  2%  perf-profile.self.cycles-pp._copy_from_user
      1.07 ±  3%      -0.1        0.98 ±  2%  perf-profile.self.cycles-pp.__poll
      0.84            -0.1        0.77 ±  2%  perf-profile.self.cycles-pp.__check_heap_object
      0.66 ±  2%      -0.1        0.61 ±  2%  perf-profile.self.cycles-pp.__virt_addr_valid
      0.65            -0.0        0.61 ±  3%  perf-profile.self.cycles-pp.kfree
      0.80            -0.0        0.76 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.67 ±  2%      -0.0        0.64        perf-profile.self.cycles-pp.__entry_text_start
     19.62            +1.9       21.54        perf-profile.self.cycles-pp.__fget_light
 2.225e+11            -3.7%  2.143e+11        perf-stat.i.branch-instructions
 5.573e+08            -3.2%  5.393e+08        perf-stat.i.branch-misses
   2332742 ±  2%      -6.6%    2179079        perf-stat.i.cache-misses
  13799351            -3.9%   13256775        perf-stat.i.cache-references
      0.32            +5.0%       0.34        perf-stat.i.cpi
 3.863e+11            +1.2%  3.908e+11        perf-stat.i.cpu-cycles
    174616 ±  3%      +9.1%     190529 ±  2%  perf-stat.i.cycles-between-cache-misses
 2.777e+11            -3.7%  2.675e+11        perf-stat.i.dTLB-loads
 1.689e+11            -3.7%  1.627e+11        perf-stat.i.dTLB-stores
  50719249            -2.8%   49295350        perf-stat.i.iTLB-load-misses
   2674672           -14.5%    2285560        perf-stat.i.iTLB-loads
 1.206e+12            -3.7%  1.161e+12        perf-stat.i.instructions
      3.12            -4.8%       2.97        perf-stat.i.ipc
      1.24            -4.0%       1.19        perf-stat.i.metric.G/sec
      1.72            +1.1%       1.74        perf-stat.i.metric.GHz
     76.66            -5.6%      72.34        perf-stat.i.metric.K/sec
      1743            -3.5%       1683        perf-stat.i.metric.M/sec
    594324            -2.9%     576831        perf-stat.i.node-load-misses
      0.32            +5.0%       0.34        perf-stat.overall.cpi
    165074 ±  2%      +8.2%     178683        perf-stat.overall.cycles-between-cache-misses
      3.12            -4.8%       2.97        perf-stat.overall.ipc
 2.217e+11            -3.7%  2.135e+11        perf-stat.ps.branch-instructions
 5.554e+08            -3.2%  5.375e+08        perf-stat.ps.branch-misses
   2333651 ±  2%      -6.6%    2179985        perf-stat.ps.cache-misses
  13948192            -3.9%   13410551        perf-stat.ps.cache-references
 3.849e+11            +1.2%  3.894e+11        perf-stat.ps.cpu-cycles
 2.767e+11            -3.7%  2.665e+11        perf-stat.ps.dTLB-loads
 1.683e+11            -3.7%  1.621e+11        perf-stat.ps.dTLB-stores
  50558427            -2.8%   49131845        perf-stat.ps.iTLB-load-misses
   2664632           -14.5%    2276961 ±  2%  perf-stat.ps.iTLB-loads
 1.201e+12            -3.7%  1.157e+12        perf-stat.ps.instructions
    592459            -2.9%     575320        perf-stat.ps.node-load-misses
 3.621e+14            -3.6%  3.492e+14        perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linux-next:master] [mm/slub]  5886fc82b6: will-it-scale.per_process_ops -3.7% regression
  2023-10-20 14:21 [linux-next:master] [mm/slub] 5886fc82b6: will-it-scale.per_process_ops -3.7% regression kernel test robot
@ 2023-10-27  7:40 ` Feng Tang
  2023-10-27  7:56   ` Vlastimil Babka
  0 siblings, 1 reply; 3+ messages in thread
From: Feng Tang @ 2023-10-27  7:40 UTC (permalink / raw)
  To: Sang, Oliver
  Cc: Vlastimil Babka, oe-lkp@lists.linux.dev, lkp,
	Linux Memory Management List, Jay Patel, Huang, Ying,
	Yin, Fengwei

On Fri, Oct 20, 2023 at 10:21:28PM +0800, Sang, Oliver wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a -3.7% regression of will-it-scale.per_process_ops on:

I was surprised to see this initially, as I know this patch which
only affects the order of a few slabs in a certain size range, and
0Day has enabled the 64 bytes alignment for function address.

One only big difference of perf hot spot is

>      19.62            +1.9       21.54        perf-profile.self.cycles-pp.__fget_light

but its code flow and data doesn't have much to do with the commit.

I manually run the test case, and didn't see the affected slabs 
actively used by checking 'slabtop'

Then I hacked to move slub.c to a very late position when linking
kernel image, so that very few other kernel modules' alignment will
be affected, and the regression is gone.

So this seems to be another strange perf change caused by text code
alignment changes. similar to another recent case of MCE patch 
https://lore.kernel.org/lkml/202310111637.dee70328-oliver.sang@intel.com/

Thanks,
Feng

>
> 
> commit: 5886fc82b6e3166dd1ba876809888fc39028d626 ("mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> testcase: will-it-scale
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
> 
> 	nr_task: 50%
> 	mode: process
> 	test: poll2
> 	cpufreq_governor: performance
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202310202221.fdbcbe56-oliver.sang@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20231020/202310202221.fdbcbe56-oliver.sang@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale
> 
> commit: 
>   0fe2735d5e ("mm/slub: remove min_objects loop from calculate_order()")
>   5886fc82b6 ("mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()")
> 
> 0fe2735d5e2e0060 5886fc82b6e3166dd1ba8768098 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>      28.08            +1.1%      28.40        boot-time.dhcp
>       6.17 ± 10%     -15.4%       5.22 ± 10%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>       6.17 ± 10%     -15.4%       5.22 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>   98376568            -3.7%   94713387        will-it-scale.112.processes
>     878361            -3.7%     845654        will-it-scale.per_process_ops
>   98376568            -3.7%   94713387        will-it-scale.workload
>      81444            +4.8%      85370        proc-vmstat.nr_active_anon
>      85071            +4.8%      89137        proc-vmstat.nr_shmem
>      81444            +4.8%      85370        proc-vmstat.nr_zone_active_anon
>      79205            +3.8%      82205        proc-vmstat.pgactivate
>       5.18            -0.4        4.79 ±  2%  perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
>       2.18            -0.2        2.03 ±  2%  perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       2.29            -0.1        2.19        perf-profile.calltrace.cycles-pp.__entry_text_start.__poll
>       0.83            -0.1        0.76 ±  3%  perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
>       0.90            -0.1        0.84 ±  2%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
>       0.66 ±  2%      -0.1        0.61 ±  2%  perf-profile.calltrace.cycles-pp.__virt_addr_valid.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll
>       0.66            -0.0        0.61 ±  3%  perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      47.75            +1.3       49.07        perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      22.63            +2.1       24.74        perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
>       5.17            -0.4        4.78 ±  2%  perf-profile.children.cycles-pp.__fdget
>       2.35            -0.2        2.18 ±  2%  perf-profile.children.cycles-pp.__check_object_size
>       0.84            -0.1        0.77 ±  3%  perf-profile.children.cycles-pp.__check_heap_object
>       1.48            -0.1        1.41        perf-profile.children.cycles-pp.__entry_text_start
>       0.94            -0.1        0.87 ±  2%  perf-profile.children.cycles-pp.check_heap_object
>       1.57            -0.1        1.51 ±  2%  perf-profile.children.cycles-pp.__kmalloc
>       0.68 ±  2%      -0.1        0.63        perf-profile.children.cycles-pp.__virt_addr_valid
>       0.66            -0.0        0.61 ±  3%  perf-profile.children.cycles-pp.kfree
>       0.83            -0.0        0.79 ±  2%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>      22.29            +1.7       24.01        perf-profile.children.cycles-pp.__fget_light
>      48.12            +1.7       49.84        perf-profile.children.cycles-pp.do_poll
>       7.66            -0.4        7.22        perf-profile.self.cycles-pp.do_sys_poll
>       2.58 ±  2%      -0.2        2.38 ±  2%  perf-profile.self.cycles-pp.__fdget
>       2.23            -0.1        2.12 ±  2%  perf-profile.self.cycles-pp._copy_from_user
>       1.07 ±  3%      -0.1        0.98 ±  2%  perf-profile.self.cycles-pp.__poll
>       0.84            -0.1        0.77 ±  2%  perf-profile.self.cycles-pp.__check_heap_object
>       0.66 ±  2%      -0.1        0.61 ±  2%  perf-profile.self.cycles-pp.__virt_addr_valid
>       0.65            -0.0        0.61 ±  3%  perf-profile.self.cycles-pp.kfree
>       0.80            -0.0        0.76 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.67 ±  2%      -0.0        0.64        perf-profile.self.cycles-pp.__entry_text_start
>      19.62            +1.9       21.54        perf-profile.self.cycles-pp.__fget_light
>  2.225e+11            -3.7%  2.143e+11        perf-stat.i.branch-instructions
>  5.573e+08            -3.2%  5.393e+08        perf-stat.i.branch-misses
>    2332742 ±  2%      -6.6%    2179079        perf-stat.i.cache-misses
>   13799351            -3.9%   13256775        perf-stat.i.cache-references
>       0.32            +5.0%       0.34        perf-stat.i.cpi
>  3.863e+11            +1.2%  3.908e+11        perf-stat.i.cpu-cycles
>     174616 ±  3%      +9.1%     190529 ±  2%  perf-stat.i.cycles-between-cache-misses
>  2.777e+11            -3.7%  2.675e+11        perf-stat.i.dTLB-loads
>  1.689e+11            -3.7%  1.627e+11        perf-stat.i.dTLB-stores
>   50719249            -2.8%   49295350        perf-stat.i.iTLB-load-misses
>    2674672           -14.5%    2285560        perf-stat.i.iTLB-loads
>  1.206e+12            -3.7%  1.161e+12        perf-stat.i.instructions
>       3.12            -4.8%       2.97        perf-stat.i.ipc
>       1.24            -4.0%       1.19        perf-stat.i.metric.G/sec
>       1.72            +1.1%       1.74        perf-stat.i.metric.GHz
>      76.66            -5.6%      72.34        perf-stat.i.metric.K/sec
>       1743            -3.5%       1683        perf-stat.i.metric.M/sec
>     594324            -2.9%     576831        perf-stat.i.node-load-misses
>       0.32            +5.0%       0.34        perf-stat.overall.cpi
>     165074 ±  2%      +8.2%     178683        perf-stat.overall.cycles-between-cache-misses
>       3.12            -4.8%       2.97        perf-stat.overall.ipc
>  2.217e+11            -3.7%  2.135e+11        perf-stat.ps.branch-instructions
>  5.554e+08            -3.2%  5.375e+08        perf-stat.ps.branch-misses
>    2333651 ±  2%      -6.6%    2179985        perf-stat.ps.cache-misses
>   13948192            -3.9%   13410551        perf-stat.ps.cache-references
>  3.849e+11            +1.2%  3.894e+11        perf-stat.ps.cpu-cycles
>  2.767e+11            -3.7%  2.665e+11        perf-stat.ps.dTLB-loads
>  1.683e+11            -3.7%  1.621e+11        perf-stat.ps.dTLB-stores
>   50558427            -2.8%   49131845        perf-stat.ps.iTLB-load-misses
>    2664632           -14.5%    2276961 ±  2%  perf-stat.ps.iTLB-loads
>  1.201e+12            -3.7%  1.157e+12        perf-stat.ps.instructions
>     592459            -2.9%     575320        perf-stat.ps.node-load-misses
>  3.621e+14            -3.6%  3.492e+14        perf-stat.total.instructions


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linux-next:master] [mm/slub] 5886fc82b6: will-it-scale.per_process_ops -3.7% regression
  2023-10-27  7:40 ` Feng Tang
@ 2023-10-27  7:56   ` Vlastimil Babka
  0 siblings, 0 replies; 3+ messages in thread
From: Vlastimil Babka @ 2023-10-27  7:56 UTC (permalink / raw)
  To: Feng Tang, Sang, Oliver
  Cc: oe-lkp@lists.linux.dev, lkp, Linux Memory Management List,
	Jay Patel, Huang, Ying, Yin, Fengwei

On 10/27/23 09:40, Feng Tang wrote:
> On Fri, Oct 20, 2023 at 10:21:28PM +0800, Sang, Oliver wrote:
>> 
>> 
>> Hello,
>> 
>> kernel test robot noticed a -3.7% regression of will-it-scale.per_process_ops on:
> 
> I was surprised to see this initially, as I know this patch which
> only affects the order of a few slabs in a certain size range, and
> 0Day has enabled the 64 bytes alignment for function address.
> 
> One only big difference of perf hot spot is
> 
>>      19.62            +1.9       21.54        perf-profile.self.cycles-pp.__fget_light
> 
> but its code flow and data doesn't have much to do with the commit.
> 
> I manually run the test case, and didn't see the affected slabs 
> actively used by checking 'slabtop'
> 
> Then I hacked to move slub.c to a very late position when linking
> kernel image, so that very few other kernel modules' alignment will
> be affected, and the regression is gone.
> 
> So this seems to be another strange perf change caused by text code
> alignment changes. similar to another recent case of MCE patch 
> https://lore.kernel.org/lkml/202310111637.dee70328-oliver.sang@intel.com/

I suspected it would be something like this, thanks for confirming!



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-27  7:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-20 14:21 [linux-next:master] [mm/slub] 5886fc82b6: will-it-scale.per_process_ops -3.7% regression kernel test robot
2023-10-27  7:40 ` Feng Tang
2023-10-27  7:56   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).