All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: NeilBrown <neilb@suse.de>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-nfs@vger.kernel.org>, <ying.huang@intel.com>,
	<feng.tang@intel.com>, <fengwei.yin@intel.com>,
	"Chuck Lever" <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>,
	"Olga Kornievskaia" <kolga@netapp.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	<oliver.sang@intel.com>
Subject: Re: [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual
Date: Wed, 30 Oct 2024 14:35:19 +0800	[thread overview]
Message-ID: <202410301321.d8aebe67-oliver.sang@intel.com> (raw)
In-Reply-To: <20241023024222.691745-7-neilb@suse.de>



Hello,

kernel test robot noticed a 10.2% regression of fsmark.files_per_sec on:


commit: d7f6562adeebe62458eb11437b260d3f470849cd ("[PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual")
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/SUNRPC-move-nrthreads-counting-to-start-stop-threads/20241023-104539
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link: https://lore.kernel.org/all/20241023024222.691745-7-neilb@suse.de/
patch subject: [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual

testcase: fsmark
config: x86_64-rhel-8.3
compiler: gcc-12
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
parameters:

	iterations: 1x
	nr_threads: 32t
	disk: 1SSD
	fs: btrfs
	fs2: nfsv4
	filesize: 8K
	test_size: 400M
	sync_method: fsyncBeforeClose
	nr_directories: 16d
	nr_files_per_directory: 256fpd
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | fsmark: fsmark.files_per_sec 10.2% regression                                                  |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | disk=1SSD                                                                                      |
|                  | filesize=9B                                                                                    |
|                  | fs2=nfsv4                                                                                      |
|                  | fs=btrfs                                                                                       |
|                  | iterations=1x                                                                                  |
|                  | nr_directories=16d                                                                             |
|                  | nr_files_per_directory=256fpd                                                                  |
|                  | nr_threads=32t                                                                                 |
|                  | sync_method=fsyncBeforeClose                                                                   |
|                  | test_size=400M                                                                                 |
+------------------+------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202410301321.d8aebe67-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241030/202410301321.d8aebe67-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/8K/nfsv4/btrfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/400M/fsmark

commit: 
  4e9c43765c ("sunrpc: remove all connection limit configuration")
  d7f6562ade ("sunrpc: introduce possibility that requested number of threads is different from actual")

4e9c43765c3fd361 d7f6562adeebe62458eb11437b2 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.09 ± 78%    +173.9%       0.24 ± 29%  perf-stat.i.major-faults
      0.07 ± 66%     -61.1%       0.03 ± 23%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.02 ± 88%   +1806.2%       0.46 ±196%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    -17.50           -31.4%     -12.00        sched_debug.cpu.nr_uninterruptible.min
      5.32 ± 10%     -11.9%       4.69 ±  8%  sched_debug.cpu.nr_uninterruptible.stddev
  76584407 ±  5%     +40.6%  1.077e+08 ±  4%  fsmark.app_overhead
      3016 ±  2%     -10.2%       2707        fsmark.files_per_sec
     36.83            -7.7%      34.00        fsmark.time.percent_of_cpu_this_job_got
    285398 ±  4%     +10.5%     315320        meminfo.Active
    282671 ±  4%     +10.6%     312596        meminfo.Active(file)
     15709 ± 11%     -28.8%      11189 ±  2%  meminfo.Dirty
     70764 ±  4%     +10.6%      78241        proc-vmstat.nr_active_file
    468998            +5.4%     494463        proc-vmstat.nr_dirtied
      3931 ± 11%     -28.8%       2797 ±  2%  proc-vmstat.nr_dirty
    462247            +6.7%     493402        proc-vmstat.nr_written
     70764 ±  4%     +10.6%      78241        proc-vmstat.nr_zone_active_file
      3620 ± 10%     -30.7%       2507 ±  5%  proc-vmstat.nr_zone_write_pending
   1135721            +2.6%    1165013        proc-vmstat.numa_hit
   1086049            +2.7%    1115353        proc-vmstat.numa_local
     56063 ±  2%     +18.0%      66177        proc-vmstat.pgactivate
   1521044            +2.3%    1555275        proc-vmstat.pgalloc_normal
   2522626           +10.2%    2778923        proc-vmstat.pgpgout
      3.75 ± 26%      -1.9        1.80 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.75 ± 26%      -1.9        1.80 ±  7%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.54 ± 39%      -1.7        0.89 ±100%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_release_kernel.perf_release.__fput.task_work_run
      2.54 ± 39%      -1.7        0.89 ±100%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_release_kernel.perf_release.__fput
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.51 ±141%      +1.6        2.09 ± 29%  perf-profile.calltrace.cycles-pp.evlist_cpu_iterator__next.__evlist__disable.__cmd_record.cmd_record.run_builtin
      0.81 ±100%      +2.2        3.01 ± 44%  perf-profile.calltrace.cycles-pp.__evlist__disable.__cmd_record.cmd_record.run_builtin.handle_internal_command
      3.11 ± 27%      +2.6        5.74 ± 24%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
      3.11 ± 27%      +2.6        5.74 ± 24%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.generic_perform_write
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.ksys_write
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.vfs_write
      0.81 ±100%      +2.2        3.01 ± 44%  perf-profile.children.cycles-pp.__evlist__disable


***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/9B/nfsv4/btrfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/400M/fsmark

commit: 
  4e9c43765c ("sunrpc: remove all connection limit configuration")
  d7f6562ade ("sunrpc: introduce possibility that requested number of threads is different from actual")

4e9c43765c3fd361 d7f6562adeebe62458eb11437b2 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.35            -2.0%       2.30        iostat.cpu.user
    137839 ± 27%     +41.1%     194529 ± 17%  numa-meminfo.node0.Active
    136284 ± 27%     +41.7%     193101 ± 17%  numa-meminfo.node0.Active(file)
      2.80 ± 32%      -1.6        1.16 ± 46%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      2.80 ± 32%      -1.6        1.16 ± 46%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
 1.397e+08 ±  2%     +14.5%  1.599e+08        fsmark.app_overhead
      4135           -10.2%       3712        fsmark.files_per_sec
     48.67            -8.2%      44.67        fsmark.time.percent_of_cpu_this_job_got
     34164 ± 27%     +41.6%      48381 ± 17%  numa-vmstat.node0.nr_active_file
    276516 ± 17%     +26.7%     350296 ± 10%  numa-vmstat.node0.nr_dirtied
    276215 ± 17%     +26.5%     349400 ± 10%  numa-vmstat.node0.nr_written
     34164 ± 27%     +41.6%      48381 ± 17%  numa-vmstat.node0.nr_zone_active_file
     80.71 ± 30%     +38.5%     111.75 ± 20%  sched_debug.cfs_rq:/.removed.load_avg.avg
    270.60 ± 12%     +16.0%     313.94 ±  9%  sched_debug.cfs_rq:/.removed.load_avg.stddev
     23.59 ± 44%     +49.9%      35.35 ± 24%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     23.58 ± 44%     +49.8%      35.33 ± 24%  sched_debug.cfs_rq:/.removed.util_avg.avg
      3281 ± 11%   +1047.0%      37635 ±193%  sched_debug.cpu.avg_idle.min
    598666            +7.0%     640842        proc-vmstat.nr_dirtied
   1074454            +1.1%    1086656        proc-vmstat.nr_file_pages
    197321            +3.7%     204605        proc-vmstat.nr_inactive_file
    598082            +6.8%     638786        proc-vmstat.nr_written
    197321            +3.7%     204605        proc-vmstat.nr_zone_inactive_file
   1716397            +2.3%    1755672        proc-vmstat.numa_hit
   1666723            +2.4%    1705951        proc-vmstat.numa_local
    145474            +7.4%     156311        proc-vmstat.pgactivate
   2183965            +2.0%    2226921        proc-vmstat.pgalloc_normal
   3250818           +10.5%    3591588        proc-vmstat.pgpgout
      0.00 ±223%    +633.3%       0.02 ± 53%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ±223%    +729.3%       0.06 ± 29%  perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ±223%    +466.7%       0.03 ± 22%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
      0.00 ±223%    +973.3%       0.03 ± 61%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.02 ±223%    +480.6%       0.09 ± 25%  perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.67 ±223%    +506.5%       4.04        perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      3.74 ± 52%    +190.1%      10.85 ± 36%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.68 ±223%    +526.0%       4.25 ±  8%  perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     18.67 ±173%    +375.2%      88.72 ± 23%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ±143%    +212.5%       0.03 ± 47%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.66 ±223%    +504.3%       3.99        perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      3.69 ± 53%    +192.6%      10.79 ± 36%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.67 ±223%    +525.6%       4.17 ±  8%  perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     18.65 ±173%    +375.4%      88.66 ± 23%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  parent reply	other threads:[~2024-10-30  6:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
2024-10-23  2:37 ` [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
2024-10-23  2:37 ` [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
2024-10-23 13:42   ` Chuck Lever
2024-10-23 21:47     ` NeilBrown
2024-10-23  2:37 ` [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits NeilBrown
2024-10-23 11:48   ` Jeff Layton
2024-10-23 13:55   ` Chuck Lever
2024-10-23 16:34     ` Tom Talpey
2024-10-23 21:53       ` NeilBrown
2024-10-23  2:37 ` [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
2024-10-23 12:08   ` Jeff Layton
2024-10-23 21:18     ` NeilBrown
2024-10-23  2:37 ` [PATCH 5/6] sunrpc: remove all connection limit configuration NeilBrown
2024-10-23 12:50   ` Jeff Layton
2024-10-23  2:37 ` [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
2024-10-23 13:32   ` Jeff Layton
2024-10-30  6:35   ` kernel test robot [this message]
2024-10-23 14:00 ` [PATCH 0/6] prepare for dynamic server thread management Chuck Lever
2025-10-28 15:47 ` Jeff Layton
2025-10-28 22:36   ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202410301321.d8aebe67-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=Dai.Ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=jlayton@kernel.org \
    --cc=kolga@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=neilb@suse.de \
    --cc=oe-lkp@lists.linux.dev \
    --cc=tom@talpey.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.