All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Kundan Kumar <kundan.kumar@samsung.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	Anuj Gupta <anuj20.g@samsung.com>, <linux-mm@kvack.org>,
	<jaegeuk@kernel.org>, <chao@kernel.org>,
	<viro@zeniv.linux.org.uk>, <brauner@kernel.org>, <jack@suse.cz>,
	<miklos@szeredi.hu>, <agruenba@redhat.com>, <trondmy@kernel.org>,
	<anna@kernel.org>, <akpm@linux-foundation.org>,
	<willy@infradead.org>, <mcgrof@kernel.org>, <clm@meta.com>,
	<david@fromorbit.com>, <amir73il@gmail.com>, <axboe@kernel.dk>,
	<hch@lst.de>, <ritesh.list@gmail.com>, <djwong@kernel.org>,
	<dave@stgolabs.net>, <p.raghav@samsung.com>,
	<da.gomez@samsung.com>, <linux-f2fs-devel@lists.sourceforge.net>,
	<linux-fsdevel@vger.kernel.org>, <gfs2@lists.linux.dev>,
	<linux-nfs@vger.kernel.org>, <gost.dev@samsung.com>,
	Kundan Kumar <kundan.kumar@samsung.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH 13/13] writeback: set the num of writeback contexts to number of online cpus
Date: Tue, 3 Jun 2025 22:36:11 +0800	[thread overview]
Message-ID: <202506032246.89ddc1a2-lkp@intel.com> (raw)
In-Reply-To: <20250529111504.89912-14-kundan.kumar@samsung.com>



Hello,

kernel test robot noticed a 53.9% improvement of fsmark.files_per_sec on:


commit: 2850eee23dbc4ff9878d88625b1f84965eefcce6 ("[PATCH 13/13] writeback: set the num of writeback contexts to number of online cpus")
url: https://github.com/intel-lab-lkp/linux/commits/Kundan-Kumar/writeback-add-infra-for-parallel-writeback/20250529-193523
base: https://git.kernel.org/cgit/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/all/20250529111504.89912-14-kundan.kumar@samsung.com/
patch subject: [PATCH 13/13] writeback: set the num of writeback contexts to number of online cpus

testcase: fsmark
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 176G memory
parameters:

	iterations: 1x
	nr_threads: 32t
	disk: 1SSD
	fs: ext4
	filesize: 16MB
	test_size: 60G
	sync_method: NoSync
	nr_directories: 16d
	nr_files_per_directory: 256fpd
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | filebench: filebench.sum_operations/s 4.3% improvement                                         |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | disk=1HDD                                                                                      |
|                  | fs=xfs                                                                                         |
|                  | test=fivestreamwrite.f                                                                         |
+------------------+------------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250603/202506032246.89ddc1a2-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/16MB/ext4/1x/x86_64-rhel-9.4/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/NoSync/lkp-csl-2sp10/60G/fsmark

commit: 
  a2dadb7ea8 ("nfs: add support in nfs to handle multiple writeback contexts")
  2850eee23d ("writeback: set the num of writeback contexts to number of online cpus")

a2dadb7ea862d5c1 2850eee23dbc4ff9878d88625b1 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1641480           +13.3%    1860148 ±  2%  cpuidle..usage
    302.00 ±  8%     +14.1%     344.67 ±  7%  perf-c2c.HITM.remote
     24963 ±  4%     -13.3%      21647 ±  6%  uptime.idle
     91.64           -22.2%      71.26 ±  7%  iostat.cpu.idle
      7.34 ±  4%    +275.7%      27.59 ± 19%  iostat.cpu.iowait
      0.46 ±141%      +2.2        2.63 ± 66%  perf-profile.calltrace.cycles-pp.setlocale
      0.46 ±141%      +2.2        2.63 ± 66%  perf-profile.children.cycles-pp.setlocale
    194019            -7.5%     179552        fsmark.app_overhead
    108.10 ±  8%     +53.9%     166.40 ± 10%  fsmark.files_per_sec
     43295 ±  7%     +35.8%      58787 ±  5%  fsmark.time.voluntary_context_switches
  19970922           -10.5%   17867270 ±  2%  meminfo.Dirty
    493817           +13.1%     558422        meminfo.SUnreclaim
    141708         +1439.0%    2180863 ± 14%  meminfo.Writeback
   4993428           -10.3%    4480219 ±  2%  proc-vmstat.nr_dirty
     34285            +5.8%      36262        proc-vmstat.nr_kernel_stack
    123504           +13.1%     139636        proc-vmstat.nr_slab_unreclaimable
     36381 ±  4%   +1403.0%     546810 ± 14%  proc-vmstat.nr_writeback
     91.54           -22.1%      71.32 ±  7%  vmstat.cpu.id
      7.22 ±  4%    +280.4%      27.47 ± 19%  vmstat.cpu.wa
     14.58 ±  4%    +537.4%      92.92 ±  8%  vmstat.procs.b
      6140 ±  2%     +90.1%      11673 ±  9%  vmstat.system.cs
     91.46           -20.9       70.56 ±  7%  mpstat.cpu.all.idle%
      7.52 ±  4%     +20.8       28.29 ± 19%  mpstat.cpu.all.iowait%
      0.12 ±  7%      +0.0        0.14 ±  7%  mpstat.cpu.all.irq%
      0.35 ±  6%      +0.1        0.43 ±  5%  mpstat.cpu.all.sys%
     11.24 ±  8%     +20.7%      13.56 ±  3%  mpstat.max_utilization_pct
     34947 ±  5%     +14.3%      39928 ±  4%  numa-vmstat.node0.nr_slab_unreclaimable
      9001 ± 14%   +1553.7%     148860 ± 19%  numa-vmstat.node0.nr_writeback
   1329092 ±  4%     -20.9%    1051569 ±  9%  numa-vmstat.node1.nr_dirty
     10019 ±  7%   +1490.0%     159311 ± 14%  numa-vmstat.node1.nr_writeback
   2808522 ±  8%     -17.7%    2310216 ±  2%  numa-vmstat.node2.nr_file_pages
   2638799 ±  3%     -12.7%    2304024 ±  2%  numa-vmstat.node2.nr_inactive_file
      7810 ±  9%   +1035.8%      88707 ± 16%  numa-vmstat.node2.nr_writeback
   2638797 ±  3%     -12.7%    2304025 ±  2%  numa-vmstat.node2.nr_zone_inactive_file
     29952 ±  3%     +13.4%      33964 ±  4%  numa-vmstat.node3.nr_slab_unreclaimable
     10686 ±  9%   +1351.3%     155091 ± 12%  numa-vmstat.node3.nr_writeback
    139656 ±  5%     +14.2%     159539 ±  4%  numa-meminfo.node0.SUnreclaim
     35586 ± 13%   +1565.8%     592799 ± 18%  numa-meminfo.node0.Writeback
   5304285 ±  4%     -20.8%    4198452 ± 10%  numa-meminfo.node1.Dirty
     40011 ±  5%   +1484.2%     633862 ± 14%  numa-meminfo.node1.Writeback
  11211668 ±  7%     -17.7%    9222157 ±  2%  numa-meminfo.node2.FilePages
  10532776 ±  3%     -12.7%    9197387 ±  2%  numa-meminfo.node2.Inactive
  10532776 ±  3%     -12.7%    9197387 ±  2%  numa-meminfo.node2.Inactive(file)
  12378624 ±  7%     -15.0%   10520827 ±  2%  numa-meminfo.node2.MemUsed
     29574 ±  9%   +1087.0%     351055 ± 16%  numa-meminfo.node2.Writeback
    119679 ±  3%     +13.4%     135718 ±  4%  numa-meminfo.node3.SUnreclaim
     41446 ± 10%   +1380.1%     613443 ± 11%  numa-meminfo.node3.Writeback
     23.38 ±  2%      -6.7       16.72        perf-stat.i.cache-miss-rate%
  38590732 ±  3%     +53.9%   59394561 ±  5%  perf-stat.i.cache-references
      5973 ±  2%     +96.3%      11729 ± 10%  perf-stat.i.context-switches
      0.92            +7.4%       0.99        perf-stat.i.cpi
 7.023e+09 ±  3%     +12.5%  7.898e+09 ±  4%  perf-stat.i.cpu-cycles
    237.41 ±  2%    +393.6%       1171 ± 20%  perf-stat.i.cpu-migrations
      1035 ±  3%      +9.3%       1132 ±  4%  perf-stat.i.cycles-between-cache-misses
      1.15            -5.7%       1.09        perf-stat.i.ipc
     25.41 ±  2%      -7.4       18.03 ±  2%  perf-stat.overall.cache-miss-rate%
      0.94            +7.2%       1.01        perf-stat.overall.cpi
      1.06            -6.8%       0.99        perf-stat.overall.ipc
  38042801 ±  3%     +54.0%   58576659 ±  5%  perf-stat.ps.cache-references
      5897 ±  2%     +96.3%      11577 ± 10%  perf-stat.ps.context-switches
 6.925e+09 ±  3%     +12.4%  7.787e+09 ±  4%  perf-stat.ps.cpu-cycles
    234.11 ±  2%    +394.2%       1156 ± 20%  perf-stat.ps.cpu-migrations
 5.892e+11            +1.4%  5.977e+11        perf-stat.total.instructions
      0.08 ±  6%     -36.2%       0.05 ± 33%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_noprof.__filemap_get_folio
      0.01 ± 73%    +159.8%       0.04 ± 13%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.07 ± 10%     -33.0%       0.05 ± 44%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.01 ± 73%    +695.6%       0.09 ± 25%  perf-sched.sch_delay.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.03 ± 30%     -45.4%       0.02 ± 24%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      0.06 ± 21%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
      0.05 ± 45%    +305.0%       0.19 ± 66%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ±210%   +8303.3%       0.85 ±183%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.mpage_map_and_submit_extent.ext4_do_writepages.ext4_writepages
      0.05 ± 49%     +92.4%       0.10 ± 35%  perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      0.03 ± 76%    +165.2%       0.07 ± 28%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.07 ± 46%  +83423.5%      59.86 ±146%  perf-sched.sch_delay.max.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.06 ± 21%    -100.0%       0.00        perf-sched.sch_delay.max.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
      0.09 ±  7%     +19.7%       0.10 ±  8%  perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.10 ± 13%   +3587.7%       3.60 ±172%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     18911          +109.0%      39526 ± 25%  perf-sched.total_wait_and_delay.count.ms
      4.42 ± 25%     +77.8%       7.86 ± 15%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    150.39 ±  6%     -14.9%     127.97 ±  8%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      8.03 ± 89%     -74.5%       2.05 ±143%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.24 ±  8%   +2018.4%      26.26 ± 26%  perf-sched.wait_and_delay.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.83 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     86.14 ±  9%     +33.0%     114.52 ±  7%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1047 ±  6%      -8.2%     960.83        perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    171.50 ±  7%     -69.1%      53.00 ±141%  perf-sched.wait_and_delay.count.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
    162.50 ±  7%     -69.3%      49.83 ±141%  perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3635 ±  6%    +451.1%      20036 ± 35%  perf-sched.wait_and_delay.count.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
     26.33 ±  5%     -12.7%      23.00 ±  2%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
    116.17          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      3938 ± 21%     -66.2%       1332 ± 60%  perf-sched.wait_and_delay.count.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
      4831 ±  5%    +102.8%       9799 ± 23%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     87.47 ± 20%    +823.3%     807.60 ±141%  perf-sched.wait_and_delay.max.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      2.81 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     77.99 ± 13%   +2082.7%       1702 ± 73%  perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
    396.00 ± 16%     -34.0%     261.22 ± 33%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      3.89 ± 18%    +100.6%       7.81 ± 15%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.13 ±157%  +2.1e+05%     271.76 ±126%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.mpage_map_and_submit_extent.ext4_do_writepages.ext4_writepages
    150.37 ±  6%     -15.2%     127.52 ±  8%  perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      1.23 ±  8%   +2030.8%      26.17 ± 26%  perf-sched.wait_time.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
     31.17 ± 50%    -100.0%       0.00        perf-sched.wait_time.avg.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
     86.09 ±  9%     +32.8%     114.34 ±  7%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1086 ± 17%    +309.4%       4449 ± 26%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.17 ±142%  +7.2e+05%       1196 ±119%  perf-sched.wait_time.max.ms.__cond_resched.down_write.mpage_map_and_submit_extent.ext4_do_writepages.ext4_writepages
      7.27 ± 45%   +1259.6%      98.80 ±139%  perf-sched.wait_time.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
    262.77 ±113%    +316.1%       1093 ± 52%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     87.45 ± 20%    +823.5%     807.53 ±141%  perf-sched.wait_time.max.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.04 ± 30%    +992.0%       0.43 ± 92%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
     31.17 ± 50%    -100.0%       0.00        perf-sched.wait_time.max.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
     75.85 ± 16%   +2144.2%       1702 ± 73%  perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
    395.95 ± 16%     -34.0%     261.15 ± 33%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread


***************************************************************************************************
lkp-icl-2sp6: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/rootfs/tbox_group/test/testcase:
  gcc-12/performance/1HDD/xfs/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/fivestreamwrite.f/filebench

commit: 
  a2dadb7ea8 ("nfs: add support in nfs to handle multiple writeback contexts")
  2850eee23d ("writeback: set the num of writeback contexts to number of online cpus")

a2dadb7ea862d5c1 2850eee23dbc4ff9878d88625b1 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.06 ±  3%      +1.5        3.58        mpstat.cpu.all.iowait%
   8388855 ±  5%     +17.7%    9875928 ±  5%  numa-meminfo.node0.Dirty
      0.02 ±  5%     +48.0%       0.03 ±  3%  sched_debug.cpu.nr_uninterruptible.avg
      2.70           +72.6%       4.65        vmstat.procs.b
     97.67            -1.5%      96.17        iostat.cpu.idle
      2.04 ±  3%     +73.9%       3.55        iostat.cpu.iowait
   2094449 ±  5%     +17.8%    2468063 ±  5%  numa-vmstat.node0.nr_dirty
   2113170 ±  5%     +17.7%    2487005 ±  5%  numa-vmstat.node0.nr_zone_write_pending
      6.99 ±  3%      +0.5        7.48 ±  2%  perf-stat.i.cache-miss-rate%
      1.82            +3.2%       1.88        perf-stat.i.cpi
      0.64            -2.0%       0.62        perf-stat.i.ipc
      2.88 ±  5%      +9.5%       3.15 ±  4%  perf-stat.overall.MPKI
    464.45            +4.3%     484.58        filebench.sum_bytes_mb/s
     27873            +4.3%      29084        filebench.sum_operations
    464.51            +4.3%     484.66        filebench.sum_operations/s
     10.76            -4.2%      10.31        filebench.sum_time_ms/op
    464.67            +4.3%     484.67        filebench.sum_writes/s
  57175040            +4.2%   59565397        filebench.time.file_system_outputs
   7146880            +4.2%    7445674        proc-vmstat.nr_dirtied
   4412053            +9.1%    4815253        proc-vmstat.nr_dirty
   7485090            +5.0%    7855964        proc-vmstat.nr_file_pages
  24899858            -1.5%   24530112        proc-vmstat.nr_free_pages
  24705120            -1.5%   24343672        proc-vmstat.nr_free_pages_blocks
   6573042            +5.6%    6943969        proc-vmstat.nr_inactive_file
     34473 ±  3%      +7.5%      37072        proc-vmstat.nr_writeback
   6573042            +5.6%    6943969        proc-vmstat.nr_zone_inactive_file
   4446526            +9.1%    4852325        proc-vmstat.nr_zone_write_pending
   7963041            +3.8%    8262916        proc-vmstat.pgalloc_normal
      0.02 ± 10%     +45.0%       0.03 ±  5%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    317.88 ±166%    -100.0%       0.08 ± 60%  perf-sched.sch_delay.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
    474.99 ±141%    -100.0%       0.10 ± 49%  perf-sched.sch_delay.max.ms.kthreadd.ret_from_fork.ret_from_fork_asm
     17.87 ± 13%    +125.8%      40.36 ±  4%  perf-sched.wait_and_delay.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
     47.75           +19.8%      57.20 ±  5%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
    517.00           -17.2%     427.83 ±  5%  perf-sched.wait_and_delay.count.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
     54.05 ±  2%    +253.0%     190.80 ± 18%  perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
      4286 ±  4%      -8.8%       3909 ±  8%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     17.77 ± 13%    +126.0%      40.16 ±  4%  perf-sched.wait_time.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
     47.63           +19.8%      57.06 ±  5%  perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
     53.95 ±  2%    +253.5%     190.70 ± 18%  perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
      4285 ±  4%      -8.9%       3906 ±  7%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.77 ± 15%      -0.3        0.43 ± 72%  perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue
      1.76 ±  9%      -0.3        1.44 ±  8%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.73 ± 10%      -0.3        1.43 ±  8%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.85 ± 12%      -0.2        0.66 ± 14%  perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule_idle.do_idle.cpu_startup_entry
      4.74 ±  6%      -0.6        4.17 ±  9%  perf-profile.children.cycles-pp.__handle_mm_fault
      4.92 ±  6%      -0.6        4.36 ±  9%  perf-profile.children.cycles-pp.handle_mm_fault
      2.85 ±  5%      -0.5        2.34 ±  8%  perf-profile.children.cycles-pp.enqueue_task
      2.60 ±  4%      -0.4        2.15 ±  8%  perf-profile.children.cycles-pp.enqueue_task_fair
      2.58 ±  6%      -0.4        2.22 ±  8%  perf-profile.children.cycles-pp.do_pte_missing
      2.04 ±  9%      -0.3        1.70 ± 10%  perf-profile.children.cycles-pp.do_read_fault
      1.88 ±  5%      -0.3        1.56 ±  7%  perf-profile.children.cycles-pp.ttwu_do_activate
      1.85 ± 10%      -0.3        1.58 ± 12%  perf-profile.children.cycles-pp.filemap_map_pages
      1.20 ±  8%      -0.2        0.98 ±  9%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.49 ± 23%      -0.2        0.29 ± 23%  perf-profile.children.cycles-pp.set_next_task_fair
      0.38 ± 32%      -0.2        0.20 ± 39%  perf-profile.children.cycles-pp.strnlen_user
      0.44 ± 28%      -0.2        0.26 ± 27%  perf-profile.children.cycles-pp.set_next_entity
      0.70 ±  7%      -0.2        0.52 ±  7%  perf-profile.children.cycles-pp.folios_put_refs
      0.22 ± 20%      -0.1        0.15 ± 27%  perf-profile.children.cycles-pp.try_charge_memcg
      0.02 ±141%      +0.1        0.08 ± 44%  perf-profile.children.cycles-pp.__blk_mq_alloc_driver_tag
      0.09 ± 59%      +0.1        0.18 ± 21%  perf-profile.children.cycles-pp.irq_work_tick
      0.26 ± 22%      +0.2        0.41 ± 13%  perf-profile.children.cycles-pp.cpu_stop_queue_work
      0.34 ± 31%      +0.2        0.57 ± 27%  perf-profile.children.cycles-pp.perf_event_mmap_event
      2.68 ± 10%      +0.4        3.10 ± 12%  perf-profile.children.cycles-pp.sched_balance_domains
      0.38 ± 32%      -0.2        0.19 ± 44%  perf-profile.self.cycles-pp.strnlen_user
      0.02 ±141%      +0.1        0.08 ± 44%  perf-profile.self.cycles-pp.ahci_single_level_irq_intr
      0.22 ± 36%      +0.2        0.43 ± 21%  perf-profile.self.cycles-pp.sched_balance_rq
      0.36 ± 41%      +0.3        0.66 ± 18%  perf-profile.self.cycles-pp._find_next_and_bit





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


WARNING: multiple messages have this Message-ID (diff)
From: kernel test robot <oliver.sang@intel.com>
To: Kundan Kumar <kundan.kumar@samsung.com>
Cc: ritesh.list@gmail.com, jack@suse.cz, djwong@kernel.org,
	amir73il@gmail.com, david@fromorbit.com, gfs2@lists.linux.dev,
	linux-mm@kvack.org, Kundan Kumar <kundan.kumar@samsung.com>,
	clm@meta.com, hch@lst.de, dave@stgolabs.net, lkp@intel.com,
	miklos@szeredi.hu, gost.dev@samsung.com, willy@infradead.org,
	p.raghav@samsung.com, Anuj Gupta <anuj20.g@samsung.com>,
	linux-nfs@vger.kernel.org, da.gomez@samsung.com,
	linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	jaegeuk@kernel.org, agruenba@redhat.com, axboe@kernel.dk,
	brauner@kernel.org, linux-f2fs-devel@lists.sourceforge.net,
	mcgrof@kernel.org, oliver.sang@intel.com, anna@kernel.org,
	oe-lkp@lists.linux.dev, akpm@linux-foundation.org,
	trondmy@kernel.org
Subject: Re: [f2fs-dev] [PATCH 13/13] writeback: set the num of writeback contexts to number of online cpus
Date: Tue, 3 Jun 2025 22:36:11 +0800	[thread overview]
Message-ID: <202506032246.89ddc1a2-lkp@intel.com> (raw)
In-Reply-To: <20250529111504.89912-14-kundan.kumar@samsung.com>



Hello,

kernel test robot noticed a 53.9% improvement of fsmark.files_per_sec on:


commit: 2850eee23dbc4ff9878d88625b1f84965eefcce6 ("[PATCH 13/13] writeback: set the num of writeback contexts to number of online cpus")
url: https://github.com/intel-lab-lkp/linux/commits/Kundan-Kumar/writeback-add-infra-for-parallel-writeback/20250529-193523
base: https://git.kernel.org/cgit/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/all/20250529111504.89912-14-kundan.kumar@samsung.com/
patch subject: [PATCH 13/13] writeback: set the num of writeback contexts to number of online cpus

testcase: fsmark
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 176G memory
parameters:

	iterations: 1x
	nr_threads: 32t
	disk: 1SSD
	fs: ext4
	filesize: 16MB
	test_size: 60G
	sync_method: NoSync
	nr_directories: 16d
	nr_files_per_directory: 256fpd
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | filebench: filebench.sum_operations/s 4.3% improvement                                         |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | disk=1HDD                                                                                      |
|                  | fs=xfs                                                                                         |
|                  | test=fivestreamwrite.f                                                                         |
+------------------+------------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250603/202506032246.89ddc1a2-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/16MB/ext4/1x/x86_64-rhel-9.4/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/NoSync/lkp-csl-2sp10/60G/fsmark

commit: 
  a2dadb7ea8 ("nfs: add support in nfs to handle multiple writeback contexts")
  2850eee23d ("writeback: set the num of writeback contexts to number of online cpus")

a2dadb7ea862d5c1 2850eee23dbc4ff9878d88625b1 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1641480           +13.3%    1860148 ±  2%  cpuidle..usage
    302.00 ±  8%     +14.1%     344.67 ±  7%  perf-c2c.HITM.remote
     24963 ±  4%     -13.3%      21647 ±  6%  uptime.idle
     91.64           -22.2%      71.26 ±  7%  iostat.cpu.idle
      7.34 ±  4%    +275.7%      27.59 ± 19%  iostat.cpu.iowait
      0.46 ±141%      +2.2        2.63 ± 66%  perf-profile.calltrace.cycles-pp.setlocale
      0.46 ±141%      +2.2        2.63 ± 66%  perf-profile.children.cycles-pp.setlocale
    194019            -7.5%     179552        fsmark.app_overhead
    108.10 ±  8%     +53.9%     166.40 ± 10%  fsmark.files_per_sec
     43295 ±  7%     +35.8%      58787 ±  5%  fsmark.time.voluntary_context_switches
  19970922           -10.5%   17867270 ±  2%  meminfo.Dirty
    493817           +13.1%     558422        meminfo.SUnreclaim
    141708         +1439.0%    2180863 ± 14%  meminfo.Writeback
   4993428           -10.3%    4480219 ±  2%  proc-vmstat.nr_dirty
     34285            +5.8%      36262        proc-vmstat.nr_kernel_stack
    123504           +13.1%     139636        proc-vmstat.nr_slab_unreclaimable
     36381 ±  4%   +1403.0%     546810 ± 14%  proc-vmstat.nr_writeback
     91.54           -22.1%      71.32 ±  7%  vmstat.cpu.id
      7.22 ±  4%    +280.4%      27.47 ± 19%  vmstat.cpu.wa
     14.58 ±  4%    +537.4%      92.92 ±  8%  vmstat.procs.b
      6140 ±  2%     +90.1%      11673 ±  9%  vmstat.system.cs
     91.46           -20.9       70.56 ±  7%  mpstat.cpu.all.idle%
      7.52 ±  4%     +20.8       28.29 ± 19%  mpstat.cpu.all.iowait%
      0.12 ±  7%      +0.0        0.14 ±  7%  mpstat.cpu.all.irq%
      0.35 ±  6%      +0.1        0.43 ±  5%  mpstat.cpu.all.sys%
     11.24 ±  8%     +20.7%      13.56 ±  3%  mpstat.max_utilization_pct
     34947 ±  5%     +14.3%      39928 ±  4%  numa-vmstat.node0.nr_slab_unreclaimable
      9001 ± 14%   +1553.7%     148860 ± 19%  numa-vmstat.node0.nr_writeback
   1329092 ±  4%     -20.9%    1051569 ±  9%  numa-vmstat.node1.nr_dirty
     10019 ±  7%   +1490.0%     159311 ± 14%  numa-vmstat.node1.nr_writeback
   2808522 ±  8%     -17.7%    2310216 ±  2%  numa-vmstat.node2.nr_file_pages
   2638799 ±  3%     -12.7%    2304024 ±  2%  numa-vmstat.node2.nr_inactive_file
      7810 ±  9%   +1035.8%      88707 ± 16%  numa-vmstat.node2.nr_writeback
   2638797 ±  3%     -12.7%    2304025 ±  2%  numa-vmstat.node2.nr_zone_inactive_file
     29952 ±  3%     +13.4%      33964 ±  4%  numa-vmstat.node3.nr_slab_unreclaimable
     10686 ±  9%   +1351.3%     155091 ± 12%  numa-vmstat.node3.nr_writeback
    139656 ±  5%     +14.2%     159539 ±  4%  numa-meminfo.node0.SUnreclaim
     35586 ± 13%   +1565.8%     592799 ± 18%  numa-meminfo.node0.Writeback
   5304285 ±  4%     -20.8%    4198452 ± 10%  numa-meminfo.node1.Dirty
     40011 ±  5%   +1484.2%     633862 ± 14%  numa-meminfo.node1.Writeback
  11211668 ±  7%     -17.7%    9222157 ±  2%  numa-meminfo.node2.FilePages
  10532776 ±  3%     -12.7%    9197387 ±  2%  numa-meminfo.node2.Inactive
  10532776 ±  3%     -12.7%    9197387 ±  2%  numa-meminfo.node2.Inactive(file)
  12378624 ±  7%     -15.0%   10520827 ±  2%  numa-meminfo.node2.MemUsed
     29574 ±  9%   +1087.0%     351055 ± 16%  numa-meminfo.node2.Writeback
    119679 ±  3%     +13.4%     135718 ±  4%  numa-meminfo.node3.SUnreclaim
     41446 ± 10%   +1380.1%     613443 ± 11%  numa-meminfo.node3.Writeback
     23.38 ±  2%      -6.7       16.72        perf-stat.i.cache-miss-rate%
  38590732 ±  3%     +53.9%   59394561 ±  5%  perf-stat.i.cache-references
      5973 ±  2%     +96.3%      11729 ± 10%  perf-stat.i.context-switches
      0.92            +7.4%       0.99        perf-stat.i.cpi
 7.023e+09 ±  3%     +12.5%  7.898e+09 ±  4%  perf-stat.i.cpu-cycles
    237.41 ±  2%    +393.6%       1171 ± 20%  perf-stat.i.cpu-migrations
      1035 ±  3%      +9.3%       1132 ±  4%  perf-stat.i.cycles-between-cache-misses
      1.15            -5.7%       1.09        perf-stat.i.ipc
     25.41 ±  2%      -7.4       18.03 ±  2%  perf-stat.overall.cache-miss-rate%
      0.94            +7.2%       1.01        perf-stat.overall.cpi
      1.06            -6.8%       0.99        perf-stat.overall.ipc
  38042801 ±  3%     +54.0%   58576659 ±  5%  perf-stat.ps.cache-references
      5897 ±  2%     +96.3%      11577 ± 10%  perf-stat.ps.context-switches
 6.925e+09 ±  3%     +12.4%  7.787e+09 ±  4%  perf-stat.ps.cpu-cycles
    234.11 ±  2%    +394.2%       1156 ± 20%  perf-stat.ps.cpu-migrations
 5.892e+11            +1.4%  5.977e+11        perf-stat.total.instructions
      0.08 ±  6%     -36.2%       0.05 ± 33%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_noprof.__filemap_get_folio
      0.01 ± 73%    +159.8%       0.04 ± 13%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.07 ± 10%     -33.0%       0.05 ± 44%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.01 ± 73%    +695.6%       0.09 ± 25%  perf-sched.sch_delay.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.03 ± 30%     -45.4%       0.02 ± 24%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      0.06 ± 21%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
      0.05 ± 45%    +305.0%       0.19 ± 66%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ±210%   +8303.3%       0.85 ±183%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.mpage_map_and_submit_extent.ext4_do_writepages.ext4_writepages
      0.05 ± 49%     +92.4%       0.10 ± 35%  perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      0.03 ± 76%    +165.2%       0.07 ± 28%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.07 ± 46%  +83423.5%      59.86 ±146%  perf-sched.sch_delay.max.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.06 ± 21%    -100.0%       0.00        perf-sched.sch_delay.max.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
      0.09 ±  7%     +19.7%       0.10 ±  8%  perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.10 ± 13%   +3587.7%       3.60 ±172%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     18911          +109.0%      39526 ± 25%  perf-sched.total_wait_and_delay.count.ms
      4.42 ± 25%     +77.8%       7.86 ± 15%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    150.39 ±  6%     -14.9%     127.97 ±  8%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      8.03 ± 89%     -74.5%       2.05 ±143%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.24 ±  8%   +2018.4%      26.26 ± 26%  perf-sched.wait_and_delay.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.83 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     86.14 ±  9%     +33.0%     114.52 ±  7%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1047 ±  6%      -8.2%     960.83        perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    171.50 ±  7%     -69.1%      53.00 ±141%  perf-sched.wait_and_delay.count.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
    162.50 ±  7%     -69.3%      49.83 ±141%  perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3635 ±  6%    +451.1%      20036 ± 35%  perf-sched.wait_and_delay.count.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
     26.33 ±  5%     -12.7%      23.00 ±  2%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
    116.17          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      3938 ± 21%     -66.2%       1332 ± 60%  perf-sched.wait_and_delay.count.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
      4831 ±  5%    +102.8%       9799 ± 23%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     87.47 ± 20%    +823.3%     807.60 ±141%  perf-sched.wait_and_delay.max.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      2.81 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     77.99 ± 13%   +2082.7%       1702 ± 73%  perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
    396.00 ± 16%     -34.0%     261.22 ± 33%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      3.89 ± 18%    +100.6%       7.81 ± 15%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.13 ±157%  +2.1e+05%     271.76 ±126%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.mpage_map_and_submit_extent.ext4_do_writepages.ext4_writepages
    150.37 ±  6%     -15.2%     127.52 ±  8%  perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      1.23 ±  8%   +2030.8%      26.17 ± 26%  perf-sched.wait_time.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
     31.17 ± 50%    -100.0%       0.00        perf-sched.wait_time.avg.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
     86.09 ±  9%     +32.8%     114.34 ±  7%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1086 ± 17%    +309.4%       4449 ± 26%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.17 ±142%  +7.2e+05%       1196 ±119%  perf-sched.wait_time.max.ms.__cond_resched.down_write.mpage_map_and_submit_extent.ext4_do_writepages.ext4_writepages
      7.27 ± 45%   +1259.6%      98.80 ±139%  perf-sched.wait_time.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
    262.77 ±113%    +316.1%       1093 ± 52%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     87.45 ± 20%    +823.5%     807.53 ±141%  perf-sched.wait_time.max.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.04 ± 30%    +992.0%       0.43 ± 92%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
     31.17 ± 50%    -100.0%       0.00        perf-sched.wait_time.max.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
     75.85 ± 16%   +2144.2%       1702 ± 73%  perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
    395.95 ± 16%     -34.0%     261.15 ± 33%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread


***************************************************************************************************
lkp-icl-2sp6: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/rootfs/tbox_group/test/testcase:
  gcc-12/performance/1HDD/xfs/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/fivestreamwrite.f/filebench

commit: 
  a2dadb7ea8 ("nfs: add support in nfs to handle multiple writeback contexts")
  2850eee23d ("writeback: set the num of writeback contexts to number of online cpus")

a2dadb7ea862d5c1 2850eee23dbc4ff9878d88625b1 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.06 ±  3%      +1.5        3.58        mpstat.cpu.all.iowait%
   8388855 ±  5%     +17.7%    9875928 ±  5%  numa-meminfo.node0.Dirty
      0.02 ±  5%     +48.0%       0.03 ±  3%  sched_debug.cpu.nr_uninterruptible.avg
      2.70           +72.6%       4.65        vmstat.procs.b
     97.67            -1.5%      96.17        iostat.cpu.idle
      2.04 ±  3%     +73.9%       3.55        iostat.cpu.iowait
   2094449 ±  5%     +17.8%    2468063 ±  5%  numa-vmstat.node0.nr_dirty
   2113170 ±  5%     +17.7%    2487005 ±  5%  numa-vmstat.node0.nr_zone_write_pending
      6.99 ±  3%      +0.5        7.48 ±  2%  perf-stat.i.cache-miss-rate%
      1.82            +3.2%       1.88        perf-stat.i.cpi
      0.64            -2.0%       0.62        perf-stat.i.ipc
      2.88 ±  5%      +9.5%       3.15 ±  4%  perf-stat.overall.MPKI
    464.45            +4.3%     484.58        filebench.sum_bytes_mb/s
     27873            +4.3%      29084        filebench.sum_operations
    464.51            +4.3%     484.66        filebench.sum_operations/s
     10.76            -4.2%      10.31        filebench.sum_time_ms/op
    464.67            +4.3%     484.67        filebench.sum_writes/s
  57175040            +4.2%   59565397        filebench.time.file_system_outputs
   7146880            +4.2%    7445674        proc-vmstat.nr_dirtied
   4412053            +9.1%    4815253        proc-vmstat.nr_dirty
   7485090            +5.0%    7855964        proc-vmstat.nr_file_pages
  24899858            -1.5%   24530112        proc-vmstat.nr_free_pages
  24705120            -1.5%   24343672        proc-vmstat.nr_free_pages_blocks
   6573042            +5.6%    6943969        proc-vmstat.nr_inactive_file
     34473 ±  3%      +7.5%      37072        proc-vmstat.nr_writeback
   6573042            +5.6%    6943969        proc-vmstat.nr_zone_inactive_file
   4446526            +9.1%    4852325        proc-vmstat.nr_zone_write_pending
   7963041            +3.8%    8262916        proc-vmstat.pgalloc_normal
      0.02 ± 10%     +45.0%       0.03 ±  5%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    317.88 ±166%    -100.0%       0.08 ± 60%  perf-sched.sch_delay.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
    474.99 ±141%    -100.0%       0.10 ± 49%  perf-sched.sch_delay.max.ms.kthreadd.ret_from_fork.ret_from_fork_asm
     17.87 ± 13%    +125.8%      40.36 ±  4%  perf-sched.wait_and_delay.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
     47.75           +19.8%      57.20 ±  5%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
    517.00           -17.2%     427.83 ±  5%  perf-sched.wait_and_delay.count.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
     54.05 ±  2%    +253.0%     190.80 ± 18%  perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
      4286 ±  4%      -8.8%       3909 ±  8%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     17.77 ± 13%    +126.0%      40.16 ±  4%  perf-sched.wait_time.avg.ms.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
     47.63           +19.8%      57.06 ±  5%  perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
     53.95 ±  2%    +253.5%     190.70 ± 18%  perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited_flags
      4285 ±  4%      -8.9%       3906 ±  7%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.77 ± 15%      -0.3        0.43 ± 72%  perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue
      1.76 ±  9%      -0.3        1.44 ±  8%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.73 ± 10%      -0.3        1.43 ±  8%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.85 ± 12%      -0.2        0.66 ± 14%  perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule_idle.do_idle.cpu_startup_entry
      4.74 ±  6%      -0.6        4.17 ±  9%  perf-profile.children.cycles-pp.__handle_mm_fault
      4.92 ±  6%      -0.6        4.36 ±  9%  perf-profile.children.cycles-pp.handle_mm_fault
      2.85 ±  5%      -0.5        2.34 ±  8%  perf-profile.children.cycles-pp.enqueue_task
      2.60 ±  4%      -0.4        2.15 ±  8%  perf-profile.children.cycles-pp.enqueue_task_fair
      2.58 ±  6%      -0.4        2.22 ±  8%  perf-profile.children.cycles-pp.do_pte_missing
      2.04 ±  9%      -0.3        1.70 ± 10%  perf-profile.children.cycles-pp.do_read_fault
      1.88 ±  5%      -0.3        1.56 ±  7%  perf-profile.children.cycles-pp.ttwu_do_activate
      1.85 ± 10%      -0.3        1.58 ± 12%  perf-profile.children.cycles-pp.filemap_map_pages
      1.20 ±  8%      -0.2        0.98 ±  9%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.49 ± 23%      -0.2        0.29 ± 23%  perf-profile.children.cycles-pp.set_next_task_fair
      0.38 ± 32%      -0.2        0.20 ± 39%  perf-profile.children.cycles-pp.strnlen_user
      0.44 ± 28%      -0.2        0.26 ± 27%  perf-profile.children.cycles-pp.set_next_entity
      0.70 ±  7%      -0.2        0.52 ±  7%  perf-profile.children.cycles-pp.folios_put_refs
      0.22 ± 20%      -0.1        0.15 ± 27%  perf-profile.children.cycles-pp.try_charge_memcg
      0.02 ±141%      +0.1        0.08 ± 44%  perf-profile.children.cycles-pp.__blk_mq_alloc_driver_tag
      0.09 ± 59%      +0.1        0.18 ± 21%  perf-profile.children.cycles-pp.irq_work_tick
      0.26 ± 22%      +0.2        0.41 ± 13%  perf-profile.children.cycles-pp.cpu_stop_queue_work
      0.34 ± 31%      +0.2        0.57 ± 27%  perf-profile.children.cycles-pp.perf_event_mmap_event
      2.68 ± 10%      +0.4        3.10 ± 12%  perf-profile.children.cycles-pp.sched_balance_domains
      0.38 ± 32%      -0.2        0.19 ± 44%  perf-profile.self.cycles-pp.strnlen_user
      0.02 ±141%      +0.1        0.08 ± 44%  perf-profile.self.cycles-pp.ahci_single_level_irq_intr
      0.22 ± 36%      +0.2        0.43 ± 21%  perf-profile.self.cycles-pp.sched_balance_rq
      0.36 ± 41%      +0.3        0.66 ± 18%  perf-profile.self.cycles-pp._find_next_and_bit





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2025-06-03 14:37 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250529113215epcas5p2edd67e7b129621f386be005fdba53378@epcas5p2.samsung.com>
2025-05-29 11:14 ` [PATCH 00/13] Parallelizing filesystem writeback Kundan Kumar
2025-05-29 11:14   ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:14   ` [PATCH 01/13] writeback: add infra for parallel writeback Kundan Kumar
2025-05-29 11:14     ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:14   ` [PATCH 02/13] writeback: add support to initialize and free multiple writeback ctxs Kundan Kumar
2025-05-29 11:14     ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:14   ` [PATCH 03/13] writeback: link bdi_writeback to its corresponding bdi_writeback_ctx Kundan Kumar
2025-05-29 11:14     ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:14   ` [PATCH 04/13] writeback: affine inode to a writeback ctx within a bdi Kundan Kumar
2025-05-29 11:14     ` [f2fs-dev] " Kundan Kumar
2025-06-02 14:24     ` Christoph Hellwig
2025-06-02 14:24       ` [f2fs-dev] " Christoph Hellwig
2025-05-29 11:14   ` [PATCH 05/13] writeback: modify bdi_writeback search logic to search across all wb ctxs Kundan Kumar
2025-05-29 11:14     ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:14   ` [PATCH 06/13] writeback: invoke all writeback contexts for flusher and dirtytime writeback Kundan Kumar
2025-05-29 11:14     ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:14   ` [PATCH 07/13] writeback: modify sync related functions to iterate over all writeback contexts Kundan Kumar
2025-05-29 11:14     ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:14   ` [PATCH 08/13] writeback: add support to collect stats for all writeback ctxs Kundan Kumar
2025-05-29 11:14     ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:15   ` [PATCH 09/13] f2fs: add support in f2fs to handle multiple writeback contexts Kundan Kumar
2025-05-29 11:15     ` [f2fs-dev] " Kundan Kumar
2025-06-02 14:20     ` Christoph Hellwig
2025-06-02 14:20       ` [f2fs-dev] " Christoph Hellwig
2025-05-29 11:15   ` [PATCH 10/13] fuse: add support for multiple writeback contexts in fuse Kundan Kumar
2025-05-29 11:15     ` [f2fs-dev] " Kundan Kumar
2025-06-02 14:21     ` Christoph Hellwig
2025-06-02 14:21       ` [f2fs-dev] " Christoph Hellwig
2025-06-02 15:50       ` Bernd Schubert
2025-06-02 15:50         ` [f2fs-dev] " Bernd Schubert
2025-06-02 15:55         ` Christoph Hellwig
2025-06-02 15:55           ` [f2fs-dev] " Christoph Hellwig
2025-05-29 11:15   ` [PATCH 11/13] gfs2: add support in gfs2 to handle multiple writeback contexts Kundan Kumar
2025-05-29 11:15     ` [f2fs-dev] " Kundan Kumar
2025-05-29 11:15   ` [PATCH 12/13] nfs: add support in nfs " Kundan Kumar
2025-05-29 11:15     ` [f2fs-dev] " Kundan Kumar
2025-06-02 14:22     ` Christoph Hellwig
2025-06-02 14:22       ` [f2fs-dev] " Christoph Hellwig
2025-05-29 11:15   ` [PATCH 13/13] writeback: set the num of writeback contexts to number of online cpus Kundan Kumar
2025-05-29 11:15     ` [f2fs-dev] " Kundan Kumar
2025-06-03 14:36     ` kernel test robot [this message]
2025-06-03 14:36       ` kernel test robot
2025-05-30  3:37   ` [PATCH 00/13] Parallelizing filesystem writeback Andrew Morton
2025-05-30  3:37     ` [f2fs-dev] " Andrew Morton
2025-06-25 15:44     ` Kundan Kumar
2025-06-25 15:44       ` [f2fs-dev] " Kundan Kumar
2025-07-02 18:43       ` Darrick J. Wong
2025-07-02 18:43         ` [f2fs-dev] " Darrick J. Wong via Linux-f2fs-devel
2025-07-03 13:05         ` Christoph Hellwig
2025-07-03 13:05           ` [f2fs-dev] " Christoph Hellwig
2025-07-04  7:02           ` Kundan Kumar
2025-07-04  7:02             ` [f2fs-dev] " Kundan Kumar
2025-07-07 14:28             ` Christoph Hellwig
2025-07-07 14:28               ` [f2fs-dev] " Christoph Hellwig
2025-07-07 15:47           ` Jan Kara
2025-07-07 15:47             ` [f2fs-dev] " Jan Kara
2025-06-02 14:19   ` Christoph Hellwig
2025-06-02 14:19     ` [f2fs-dev] " Christoph Hellwig
2025-06-03  9:16     ` Anuj Gupta/Anuj Gupta
2025-06-03  9:16       ` [f2fs-dev] " Anuj Gupta/Anuj Gupta
2025-06-03 13:24       ` Christoph Hellwig
2025-06-03 13:24         ` [f2fs-dev] " Christoph Hellwig
2025-06-03 13:52         ` Anuj gupta
2025-06-03 13:52           ` [f2fs-dev] " Anuj gupta
2025-06-03 14:04           ` Christoph Hellwig
2025-06-03 14:04             ` [f2fs-dev] " Christoph Hellwig
2025-06-03 14:05             ` Christoph Hellwig
2025-06-03 14:05               ` [f2fs-dev] " Christoph Hellwig
2025-06-06  5:04               ` Kundan Kumar
2025-06-06  5:04                 ` [f2fs-dev] " Kundan Kumar
2025-06-09  4:00                 ` Christoph Hellwig
2025-06-09  4:00                   ` [f2fs-dev] " Christoph Hellwig
2025-06-04  9:22           ` Kundan Kumar
2025-06-04  9:22             ` [f2fs-dev] " Kundan Kumar
2025-06-11 15:51             ` Darrick J. Wong
2025-06-11 15:51               ` [f2fs-dev] " Darrick J. Wong via Linux-f2fs-devel
2025-06-24  5:59               ` Kundan Kumar
2025-06-24  5:59                 ` [f2fs-dev] " Kundan Kumar
2025-07-02 18:44                 ` Darrick J. Wong
2025-07-02 18:44                   ` [f2fs-dev] " Darrick J. Wong via Linux-f2fs-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202506032246.89ddc1a2-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=agruenba@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=anna@kernel.org \
    --cc=anuj20.g@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chao@kernel.org \
    --cc=clm@meta.com \
    --cc=da.gomez@samsung.com \
    --cc=dave@stgolabs.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=gfs2@lists.linux.dev \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jaegeuk@kernel.org \
    --cc=kundan.kumar@samsung.com \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=mcgrof@kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=oe-lkp@lists.linux.dev \
    --cc=p.raghav@samsung.com \
    --cc=ritesh.list@gmail.com \
    --cc=trondmy@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.