Re: [linux-next:master] [mm/slub] 5886fc82b6: will-it-scale.per_process_ops -3.7% regression

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Feng Tang <feng.tang@intel.com>
To: "Sang, Oliver" <oliver.sang@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	"oe-lkp@lists.linux.dev" <oe-lkp@lists.linux.dev>,
	lkp <lkp@intel.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Jay Patel <jaypatel@linux.ibm.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>
Subject: Re: [linux-next:master] [mm/slub]  5886fc82b6: will-it-scale.per_process_ops -3.7% regression
Date: Fri, 27 Oct 2023 15:40:22 +0800	[thread overview]
Message-ID: <ZTtpZgliuj/9WTOb@feng-clx> (raw)
In-Reply-To: <202310202221.fdbcbe56-oliver.sang@intel.com>

On Fri, Oct 20, 2023 at 10:21:28PM +0800, Sang, Oliver wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a -3.7% regression of will-it-scale.per_process_ops on:

I was surprised to see this initially, as I know this patch which
only affects the order of a few slabs in a certain size range, and
0Day has enabled the 64 bytes alignment for function address.

One only big difference of perf hot spot is

>      19.62            +1.9       21.54        perf-profile.self.cycles-pp.__fget_light

but its code flow and data doesn't have much to do with the commit.

I manually run the test case, and didn't see the affected slabs 
actively used by checking 'slabtop'

Then I hacked to move slub.c to a very late position when linking
kernel image, so that very few other kernel modules' alignment will
be affected, and the regression is gone.

So this seems to be another strange perf change caused by text code
alignment changes. similar to another recent case of MCE patch 
https://lore.kernel.org/lkml/202310111637.dee70328-oliver.sang@intel.com/

Thanks,
Feng

>
> 
> commit: 5886fc82b6e3166dd1ba876809888fc39028d626 ("mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> testcase: will-it-scale
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
> 
> 	nr_task: 50%
> 	mode: process
> 	test: poll2
> 	cpufreq_governor: performance
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202310202221.fdbcbe56-oliver.sang@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20231020/202310202221.fdbcbe56-oliver.sang@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale
> 
> commit: 
>   0fe2735d5e ("mm/slub: remove min_objects loop from calculate_order()")
>   5886fc82b6 ("mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()")
> 
> 0fe2735d5e2e0060 5886fc82b6e3166dd1ba8768098 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>      28.08            +1.1%      28.40        boot-time.dhcp
>       6.17 ± 10%     -15.4%       5.22 ± 10%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>       6.17 ± 10%     -15.4%       5.22 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>   98376568            -3.7%   94713387        will-it-scale.112.processes
>     878361            -3.7%     845654        will-it-scale.per_process_ops
>   98376568            -3.7%   94713387        will-it-scale.workload
>      81444            +4.8%      85370        proc-vmstat.nr_active_anon
>      85071            +4.8%      89137        proc-vmstat.nr_shmem
>      81444            +4.8%      85370        proc-vmstat.nr_zone_active_anon
>      79205            +3.8%      82205        proc-vmstat.pgactivate
>       5.18            -0.4        4.79 ±  2%  perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
>       2.18            -0.2        2.03 ±  2%  perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       2.29            -0.1        2.19        perf-profile.calltrace.cycles-pp.__entry_text_start.__poll
>       0.83            -0.1        0.76 ±  3%  perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
>       0.90            -0.1        0.84 ±  2%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
>       0.66 ±  2%      -0.1        0.61 ±  2%  perf-profile.calltrace.cycles-pp.__virt_addr_valid.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll
>       0.66            -0.0        0.61 ±  3%  perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      47.75            +1.3       49.07        perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      22.63            +2.1       24.74        perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
>       5.17            -0.4        4.78 ±  2%  perf-profile.children.cycles-pp.__fdget
>       2.35            -0.2        2.18 ±  2%  perf-profile.children.cycles-pp.__check_object_size
>       0.84            -0.1        0.77 ±  3%  perf-profile.children.cycles-pp.__check_heap_object
>       1.48            -0.1        1.41        perf-profile.children.cycles-pp.__entry_text_start
>       0.94            -0.1        0.87 ±  2%  perf-profile.children.cycles-pp.check_heap_object
>       1.57            -0.1        1.51 ±  2%  perf-profile.children.cycles-pp.__kmalloc
>       0.68 ±  2%      -0.1        0.63        perf-profile.children.cycles-pp.__virt_addr_valid
>       0.66            -0.0        0.61 ±  3%  perf-profile.children.cycles-pp.kfree
>       0.83            -0.0        0.79 ±  2%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>      22.29            +1.7       24.01        perf-profile.children.cycles-pp.__fget_light
>      48.12            +1.7       49.84        perf-profile.children.cycles-pp.do_poll
>       7.66            -0.4        7.22        perf-profile.self.cycles-pp.do_sys_poll
>       2.58 ±  2%      -0.2        2.38 ±  2%  perf-profile.self.cycles-pp.__fdget
>       2.23            -0.1        2.12 ±  2%  perf-profile.self.cycles-pp._copy_from_user
>       1.07 ±  3%      -0.1        0.98 ±  2%  perf-profile.self.cycles-pp.__poll
>       0.84            -0.1        0.77 ±  2%  perf-profile.self.cycles-pp.__check_heap_object
>       0.66 ±  2%      -0.1        0.61 ±  2%  perf-profile.self.cycles-pp.__virt_addr_valid
>       0.65            -0.0        0.61 ±  3%  perf-profile.self.cycles-pp.kfree
>       0.80            -0.0        0.76 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.67 ±  2%      -0.0        0.64        perf-profile.self.cycles-pp.__entry_text_start
>      19.62            +1.9       21.54        perf-profile.self.cycles-pp.__fget_light
>  2.225e+11            -3.7%  2.143e+11        perf-stat.i.branch-instructions
>  5.573e+08            -3.2%  5.393e+08        perf-stat.i.branch-misses
>    2332742 ±  2%      -6.6%    2179079        perf-stat.i.cache-misses
>   13799351            -3.9%   13256775        perf-stat.i.cache-references
>       0.32            +5.0%       0.34        perf-stat.i.cpi
>  3.863e+11            +1.2%  3.908e+11        perf-stat.i.cpu-cycles
>     174616 ±  3%      +9.1%     190529 ±  2%  perf-stat.i.cycles-between-cache-misses
>  2.777e+11            -3.7%  2.675e+11        perf-stat.i.dTLB-loads
>  1.689e+11            -3.7%  1.627e+11        perf-stat.i.dTLB-stores
>   50719249            -2.8%   49295350        perf-stat.i.iTLB-load-misses
>    2674672           -14.5%    2285560        perf-stat.i.iTLB-loads
>  1.206e+12            -3.7%  1.161e+12        perf-stat.i.instructions
>       3.12            -4.8%       2.97        perf-stat.i.ipc
>       1.24            -4.0%       1.19        perf-stat.i.metric.G/sec
>       1.72            +1.1%       1.74        perf-stat.i.metric.GHz
>      76.66            -5.6%      72.34        perf-stat.i.metric.K/sec
>       1743            -3.5%       1683        perf-stat.i.metric.M/sec
>     594324            -2.9%     576831        perf-stat.i.node-load-misses
>       0.32            +5.0%       0.34        perf-stat.overall.cpi
>     165074 ±  2%      +8.2%     178683        perf-stat.overall.cycles-between-cache-misses
>       3.12            -4.8%       2.97        perf-stat.overall.ipc
>  2.217e+11            -3.7%  2.135e+11        perf-stat.ps.branch-instructions
>  5.554e+08            -3.2%  5.375e+08        perf-stat.ps.branch-misses
>    2333651 ±  2%      -6.6%    2179985        perf-stat.ps.cache-misses
>   13948192            -3.9%   13410551        perf-stat.ps.cache-references
>  3.849e+11            +1.2%  3.894e+11        perf-stat.ps.cpu-cycles
>  2.767e+11            -3.7%  2.665e+11        perf-stat.ps.dTLB-loads
>  1.683e+11            -3.7%  1.621e+11        perf-stat.ps.dTLB-stores
>   50558427            -2.8%   49131845        perf-stat.ps.iTLB-load-misses
>    2664632           -14.5%    2276961 ±  2%  perf-stat.ps.iTLB-loads
>  1.201e+12            -3.7%  1.157e+12        perf-stat.ps.instructions
>     592459            -2.9%     575320        perf-stat.ps.node-load-misses
>  3.621e+14            -3.6%  3.492e+14        perf-stat.total.instructions

next prev parent reply	other threads:[~2023-10-27  7:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-20 14:21 [linux-next:master] [mm/slub] 5886fc82b6: will-it-scale.per_process_ops -3.7% regression kernel test robot
2023-10-27  7:40 ` Feng Tang [this message]
2023-10-27  7:56   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTtpZgliuj/9WTOb@feng-clx \
    --to=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=jaypatel@linux.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).