linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [axboe-block:for-next] [block]  1122c0c1cc:  aim7.jobs-per-min 22.6% improvement
@ 2024-06-25  2:28 kernel test robot
  2024-06-25  8:57 ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: kernel test robot @ 2024-06-25  2:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: oe-lkp, lkp, Jens Axboe, Ulf Hansson, Damien Le Moal,
	Hannes Reinecke, linux-block, linux-um, drbd-dev, nbd,
	linuxppc-dev, virtualization, xen-devel, linux-bcache, dm-devel,
	linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme, linux-scsi,
	ying.huang, feng.tang, fengwei.yin, oliver.sang



Hello,

kernel test robot noticed a 22.6% improvement of aim7.jobs-per-min on:


commit: 1122c0c1cc71f740fa4d5f14f239194e06a1d5e7 ("block: move cache control settings out of queue->flags")
https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next

testcase: aim7
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
parameters:

	disk: 4BRD_12G
	md: RAID0
	fs: xfs
	test: sync_disk_rw
	load: 300
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240625/202406250948.e0044f1d-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
  gcc-13/performance/4BRD_12G/xfs/x86_64-rhel-8.3/300/RAID0/debian-12-x86_64-20240206.cgz/lkp-csl-2sp3/sync_disk_rw/aim7

commit: 
  70905f8706 ("block: remove blk_flush_policy")
  1122c0c1cc ("block: move cache control settings out of queue->flags")

70905f8706b62113 1122c0c1cc71f740fa4d5f14f23 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    153.19           -13.3%     132.81        uptime.boot
   2.8e+09           -11.9%  2.466e+09        cpuidle..time
  21945319 ±  2%     -40.4%   13076160        cpuidle..usage
     29.31            +7.8%      31.58 ±  2%  iostat.cpu.idle
     69.87            -3.6%      67.35        iostat.cpu.system
      0.04 ±  4%      +0.0        0.08 ±  5%  mpstat.cpu.all.iowait%
      0.78 ±  2%      +0.2        0.99 ±  2%  mpstat.cpu.all.usr%
     52860 ± 49%     -78.2%      11536 ± 78%  numa-numastat.node0.other_node
     46804 ± 56%     +88.4%      88190 ± 10%  numa-numastat.node1.other_node
    955871 ± 10%     -43.3%     542216 ± 14%  numa-meminfo.node1.Active
    955871 ± 10%     -43.3%     542216 ± 14%  numa-meminfo.node1.Active(anon)
   1015354 ± 10%     -34.7%     662696 ± 13%  numa-meminfo.node1.Shmem
      6008           -14.3%       5146 ±  2%  perf-c2c.DRAM.remote
      7889           -12.4%       6908 ±  2%  perf-c2c.HITM.local
      3839           -16.5%       3203 ±  2%  perf-c2c.HITM.remote
     11728           -13.8%      10112 ±  2%  perf-c2c.HITM.total
    695109           +20.5%     837625        vmstat.io.bo
    105.99 ±  7%     -23.7%      80.83 ± 11%  vmstat.procs.r
    803244           -30.9%     555360        vmstat.system.cs
    209736           -12.9%     182626        vmstat.system.in
      1448 ± 89%    +207.9%       4459 ±  6%  numa-vmstat.node0.nr_page_table_pages
     52860 ± 49%     -78.2%      11536 ± 78%  numa-vmstat.node0.numa_other
    239214 ± 10%     -43.6%     134883 ± 13%  numa-vmstat.node1.nr_active_anon
    254124 ± 10%     -34.9%     165421 ± 13%  numa-vmstat.node1.nr_shmem
    239214 ± 10%     -43.6%     134883 ± 13%  numa-vmstat.node1.nr_zone_active_anon
     46805 ± 56%     +88.4%      88190 ± 10%  numa-vmstat.node1.numa_other
     17374           +22.6%      21299        aim7.jobs-per-min
    103.64           -18.4%      84.58        aim7.time.elapsed_time
    103.64           -18.4%      84.58        aim7.time.elapsed_time.max
   4641240           -83.4%     770073        aim7.time.involuntary_context_switches
     32705            -4.3%      31289 ±  2%  aim7.time.minor_page_faults
      6562            -3.1%       6359        aim7.time.percent_of_cpu_this_job_got
      6775           -21.0%       5351 ±  2%  aim7.time.system_time
  49095202           -38.3%   30299361        aim7.time.voluntary_context_switches
   1297567           -37.0%     817692        meminfo.Active
   1297567           -37.0%     817692        meminfo.Active(anon)
     97760 ±  5%     -23.4%      74859 ± 20%  meminfo.AnonHugePages
   2390317           -15.3%    2024905        meminfo.Committed_AS
    884407           +11.9%     989723        meminfo.Inactive
    743152 ±  2%     +14.8%     853331        meminfo.Inactive(anon)
    159265 ±  8%     +38.6%     220668 ±  3%  meminfo.Mapped
   1382079           -27.1%    1007445        meminfo.Shmem
    324534           -37.2%     203663 ±  2%  proc-vmstat.nr_active_anon
   1165686            -8.2%    1070277        proc-vmstat.nr_file_pages
    185928 ±  2%     +14.9%     213697        proc-vmstat.nr_inactive_anon
     35436            -2.9%      34420        proc-vmstat.nr_inactive_file
     40463 ±  8%     +38.2%      55918 ±  3%  proc-vmstat.nr_mapped
    345824           -27.3%     251424        proc-vmstat.nr_shmem
     28871            -1.4%      28477        proc-vmstat.nr_slab_reclaimable
    324534           -37.2%     203663 ±  2%  proc-vmstat.nr_zone_active_anon
    185928 ±  2%     +14.9%     213697        proc-vmstat.nr_zone_inactive_anon
     35436            -2.9%      34420        proc-vmstat.nr_zone_inactive_file
   5120744            -2.4%    4996195        proc-vmstat.numa_hit
   5020486            -2.5%    4896473        proc-vmstat.numa_local
    207026 ± 10%     +50.2%     310941        proc-vmstat.pgactivate
   5196440            -2.7%    5057618        proc-vmstat.pgalloc_normal
    763396 ±  6%     -11.8%     673464        proc-vmstat.pgfault
  74254490            -1.3%   73292473        proc-vmstat.pgpgout
     11.25 ± 24%     -60.0%       4.50 ± 29%  sched_debug.cfs_rq:/.h_nr_running.max
      1.59 ± 20%     -42.7%       0.91 ± 13%  sched_debug.cfs_rq:/.h_nr_running.stddev
    968.29 ±  5%     -13.2%     840.04 ±  5%  sched_debug.cfs_rq:/.runnable_avg.avg
      5533 ± 21%     -47.1%       2925 ± 21%  sched_debug.cfs_rq:/.runnable_avg.max
    798.88 ± 13%     -38.3%     492.63 ±  9%  sched_debug.cfs_rq:/.runnable_avg.stddev
    578.50 ±  5%      -9.9%     521.30 ±  4%  sched_debug.cfs_rq:/.util_avg.avg
      3120 ± 20%     -40.3%       1862 ± 19%  sched_debug.cfs_rq:/.util_avg.max
    479.36 ± 12%     -30.4%     333.40 ±  8%  sched_debug.cfs_rq:/.util_avg.stddev
      4592 ± 24%     -51.8%       2215 ± 31%  sched_debug.cfs_rq:/.util_est.max
    615.47 ± 21%     -35.7%     395.64 ± 15%  sched_debug.cfs_rq:/.util_est.stddev
     11.33 ± 24%     -58.8%       4.67 ± 26%  sched_debug.cpu.nr_running.max
      1.62 ± 20%     -42.6%       0.93 ± 11%  sched_debug.cpu.nr_running.stddev
    224323           -28.2%     161088        sched_debug.cpu.nr_switches.avg
    242363 ±  2%     -27.9%     174695 ±  2%  sched_debug.cpu.nr_switches.max
    197870 ±  2%     -27.6%     143186        sched_debug.cpu.nr_switches.min
      7911 ± 19%     -33.1%       5295 ± 10%  sched_debug.cpu.nr_switches.stddev
      1.23            -4.8%       1.17        perf-stat.i.MPKI
 1.105e+10            +5.6%  1.167e+10        perf-stat.i.branch-instructions
      1.20 ±  2%      +0.1        1.29 ±  2%  perf-stat.i.branch-miss-rate%
    820863           -30.7%     569230        perf-stat.i.context-switches
      3.79           -10.2%       3.41        perf-stat.i.cpi
 2.176e+11            -3.2%  2.106e+11        perf-stat.i.cpu-cycles
    212040           -27.8%     153137        perf-stat.i.cpu-migrations
 5.416e+10            +6.8%  5.785e+10        perf-stat.i.instructions
      0.32           +11.8%       0.36        perf-stat.i.ipc
      0.05 ± 77%    +233.9%       0.17 ± 50%  perf-stat.i.major-faults
     10.74           -30.2%       7.50        perf-stat.i.metric.K/sec
      1.28            -4.3%       1.22        perf-stat.overall.MPKI
      4.02            -9.4%       3.64        perf-stat.overall.cpi
      3145            -5.3%       2979        perf-stat.overall.cycles-between-cache-misses
      0.25           +10.3%       0.27        perf-stat.overall.ipc
 1.094e+10            +5.4%  1.153e+10        perf-stat.ps.branch-instructions
    812563           -30.8%     562343        perf-stat.ps.context-switches
 2.156e+11            -3.4%  2.082e+11        perf-stat.ps.cpu-cycles
    209965           -28.0%     151248        perf-stat.ps.cpu-migrations
 5.365e+10            +6.6%  5.717e+10        perf-stat.ps.instructions
 5.641e+12           -13.1%  4.905e+12 ±  2%  perf-stat.total.instructions
     14.88 ±  5%     -14.9        0.00        perf-profile.calltrace.cycles-pp.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write.vfs_write.ksys_write
     14.86 ±  5%     -14.9        0.00        perf-profile.calltrace.cycles-pp.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write.vfs_write
     14.77 ±  5%     -14.8        0.00        perf-profile.calltrace.cycles-pp.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write
     14.76 ±  5%     -14.8        0.00        perf-profile.calltrace.cycles-pp.__submit_bio.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync
     14.74 ±  5%     -14.7        0.00        perf-profile.calltrace.cycles-pp.md_handle_request.__submit_bio.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush
     14.72 ±  5%     -14.7        0.00        perf-profile.calltrace.cycles-pp.raid0_make_request.md_handle_request.__submit_bio.__submit_bio_noacct.submit_bio_wait
     14.71 ±  5%     -14.7        0.00        perf-profile.calltrace.cycles-pp.md_flush_request.raid0_make_request.md_handle_request.__submit_bio.__submit_bio_noacct
     13.32 ±  5%     -13.3        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.md_flush_request.raid0_make_request.md_handle_request.__submit_bio
     13.25 ±  5%     -13.3        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.md_flush_request.raid0_make_request.md_handle_request
      9.70 ±  3%      -1.1        8.61 ±  3%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      9.70 ±  3%      -1.1        8.61 ±  3%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      9.70 ±  3%      -1.1        8.61 ±  3%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      9.80 ±  3%      -1.1        8.71 ±  3%  perf-profile.calltrace.cycles-pp.common_startup_64
      9.12 ±  3%      -1.0        8.15 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      8.95 ±  3%      -0.9        8.01 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      8.95 ±  3%      -0.9        8.02 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      2.21            -0.4        1.78 ±  2%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.22            -0.4        1.79 ±  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      2.22            -0.4        1.79 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      2.22            -0.4        1.79 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      2.08            -0.4        1.68 ±  2%  perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      3.09            -0.2        2.86 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq
      3.10            -0.2        2.87 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync
      3.10            -0.2        2.87 ±  2%  perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
      3.44            -0.2        3.23 ±  4%  perf-profile.calltrace.cycles-pp.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write.vfs_write
      0.95            +0.1        1.04        perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
      0.57            +0.1        0.71 ±  2%  perf-profile.calltrace.cycles-pp.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64
      0.58 ±  2%      +0.3        0.84 ±  3%  perf-profile.calltrace.cycles-pp.xfs_end_ioend.xfs_end_io.process_one_work.worker_thread.kthread
      0.59 ±  2%      +0.3        0.85 ±  2%  perf-profile.calltrace.cycles-pp.xfs_end_io.process_one_work.worker_thread.kthread.ret_from_fork
      0.90 ±  2%      +0.4        1.27 ±  3%  perf-profile.calltrace.cycles-pp.__submit_bio_noacct.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages.do_writepages
      0.88 ±  2%      +0.4        1.26 ±  3%  perf-profile.calltrace.cycles-pp.__submit_bio.__submit_bio_noacct.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages
      0.92 ±  3%      +0.4        1.30 ±  3%  perf-profile.calltrace.cycles-pp.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc
      0.57 ±  3%      +0.4        0.95 ±  6%  perf-profile.calltrace.cycles-pp.xlog_cil_commit.__xfs_trans_commit.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks
      0.64 ±  3%      +0.4        1.03 ±  6%  perf-profile.calltrace.cycles-pp.__xfs_trans_commit.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write
      6.90 ±  2%      +0.5        7.40 ±  3%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      0.92 ±  4%      +0.5        1.43 ±  6%  perf-profile.calltrace.cycles-pp.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write.vfs_write
      0.00            +0.5        0.52        perf-profile.calltrace.cycles-pp.complete.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq
      0.94 ±  4%      +0.5        1.46 ±  6%  perf-profile.calltrace.cycles-pp.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write.vfs_write.ksys_write
      0.96 ±  4%      +0.5        1.48 ±  6%  perf-profile.calltrace.cycles-pp.xfs_file_write_checks.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64
      0.00            +0.5        0.54 ±  2%  perf-profile.calltrace.cycles-pp.xfs_iomap_write_unwritten.xfs_end_ioend.xfs_end_io.process_one_work.worker_thread
      0.00            +0.5        0.55 ±  2%  perf-profile.calltrace.cycles-pp.iomap_write_iter.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write.ksys_write
      0.00            +0.6        0.56 ± 10%  perf-profile.calltrace.cycles-pp.__folio_start_writeback.iomap_writepage_map.iomap_writepages.xfs_vm_writepages.do_writepages
      0.00            +0.6        0.57 ±  6%  perf-profile.calltrace.cycles-pp.__folio_end_writeback.folio_end_writeback.iomap_finish_ioend.md_end_clone_io.__submit_bio
      0.00            +0.6        0.58 ±  7%  perf-profile.calltrace.cycles-pp.folio_end_writeback.iomap_finish_ioend.md_end_clone_io.__submit_bio.__submit_bio_noacct
      0.00            +0.6        0.60 ±  6%  perf-profile.calltrace.cycles-pp.iomap_finish_ioend.md_end_clone_io.__submit_bio.__submit_bio_noacct.iomap_submit_ioend
      0.08 ±223%      +0.6        0.72 ±  5%  perf-profile.calltrace.cycles-pp.md_end_clone_io.__submit_bio.__submit_bio_noacct.iomap_submit_ioend.iomap_writepages
      1.45 ±  4%      +0.7        2.15 ±  4%  perf-profile.calltrace.cycles-pp.iomap_writepages.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range
      1.46 ±  4%      +0.7        2.16 ±  4%  perf-profile.calltrace.cycles-pp.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range
      1.48 ±  4%      +0.7        2.18 ±  4%  perf-profile.calltrace.cycles-pp.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync
      1.51 ±  4%      +0.7        2.22 ±  4%  perf-profile.calltrace.cycles-pp.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync.xfs_file_buffered_write
      1.51 ±  3%      +0.7        2.23 ±  4%  perf-profile.calltrace.cycles-pp.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync.xfs_file_buffered_write.vfs_write
      0.00            +0.7        0.72 ±  7%  perf-profile.calltrace.cycles-pp.iomap_writepage_map.iomap_writepages.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc
      1.60 ±  3%      +0.8        2.36 ±  4%  perf-profile.calltrace.cycles-pp.file_write_and_wait_range.xfs_file_fsync.xfs_file_buffered_write.vfs_write.ksys_write
     85.48            +0.8       86.24        perf-profile.calltrace.cycles-pp.xfs_file_fsync.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64
     87.06            +1.4       88.49        perf-profile.calltrace.cycles-pp.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     87.18            +1.5       88.64        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     87.36            +1.5       88.82        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     87.19            +1.5       88.65        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     87.36            +1.5       88.82        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     87.62            +1.5       89.10        perf-profile.calltrace.cycles-pp.write
     56.74           +13.7       70.42        perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
     57.89           +13.8       71.74        perf-profile.calltrace.cycles-pp.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq
     60.36           +14.6       74.96        perf-profile.calltrace.cycles-pp.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync
     61.48           +14.6       76.09        perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
     68.74           +14.8       83.60        perf-profile.calltrace.cycles-pp.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write.vfs_write.ksys_write
     64.97           +15.1       80.03        perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write.vfs_write
     14.86 ±  5%     -14.9        0.00        perf-profile.children.cycles-pp.submit_bio_wait
     14.96 ±  5%     -14.8        0.12 ±  4%  perf-profile.children.cycles-pp.md_handle_request
     14.94 ±  5%     -14.8        0.11 ±  3%  perf-profile.children.cycles-pp.raid0_make_request
     14.83 ±  5%     -14.8        0.00        perf-profile.children.cycles-pp.md_flush_request
     14.88 ±  5%     -14.8        0.06 ±  6%  perf-profile.children.cycles-pp.blkdev_issue_flush
     15.82 ±  5%     -14.5        1.32 ±  3%  perf-profile.children.cycles-pp.__submit_bio_noacct
     15.81 ±  5%     -14.5        1.31 ±  3%  perf-profile.children.cycles-pp.__submit_bio
     13.86 ±  5%     -13.6        0.29 ±  3%  perf-profile.children.cycles-pp._raw_spin_lock_irq
     22.32 ±  3%     -13.1        9.23 ±  4%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      1.96 ±  9%      -1.5        0.49 ±  4%  perf-profile.children.cycles-pp.intel_idle_irq
      9.70 ±  3%      -1.1        8.61 ±  3%  perf-profile.children.cycles-pp.start_secondary
      9.80 ±  3%      -1.1        8.71 ±  3%  perf-profile.children.cycles-pp.common_startup_64
      9.80 ±  3%      -1.1        8.71 ±  3%  perf-profile.children.cycles-pp.cpu_startup_entry
      9.79 ±  3%      -1.1        8.71 ±  3%  perf-profile.children.cycles-pp.do_idle
      9.20 ±  3%      -1.0        8.25 ±  3%  perf-profile.children.cycles-pp.cpuidle_idle_call
      9.04 ±  3%      -0.9        8.11 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter
      9.04 ±  3%      -0.9        8.11 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter_state
      2.21            -0.4        1.78 ±  2%  perf-profile.children.cycles-pp.worker_thread
      2.22            -0.4        1.79 ±  2%  perf-profile.children.cycles-pp.kthread
      2.22            -0.4        1.79 ±  2%  perf-profile.children.cycles-pp.ret_from_fork
      2.22            -0.4        1.79 ±  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
      2.08            -0.4        1.68 ±  2%  perf-profile.children.cycles-pp.process_one_work
      0.57            -0.3        0.24        perf-profile.children.cycles-pp.__wake_up
      0.63            -0.3        0.32 ±  2%  perf-profile.children.cycles-pp.__wake_up_common
      1.26            -0.3        0.99        perf-profile.children.cycles-pp.try_to_wake_up
      3.56 ±  2%      -0.2        3.34 ±  4%  perf-profile.children.cycles-pp.xlog_wait_on_iclog
      0.46 ±  2%      -0.1        0.36 ±  2%  perf-profile.children.cycles-pp.select_task_rq
      0.86 ±  3%      -0.1        0.75 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.43 ±  2%      -0.1        0.33 ±  2%  perf-profile.children.cycles-pp.select_task_rq_fair
      0.64            -0.1        0.55 ±  2%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.71 ±  3%      -0.1        0.62 ±  3%  perf-profile.children.cycles-pp.activate_task
      0.57            -0.1        0.48        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.17 ±  2%      -0.1        0.08        perf-profile.children.cycles-pp.xlog_state_release_iclog
      0.48            -0.1        0.41 ±  2%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.61 ±  3%      -0.1        0.54 ±  3%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.28 ±  3%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.select_idle_sibling
      0.19            -0.1        0.13 ±  2%  perf-profile.children.cycles-pp.schedule_idle
      0.22 ±  3%      -0.1        0.16 ±  4%  perf-profile.children.cycles-pp.select_idle_cpu
      0.47 ±  4%      -0.1        0.41 ±  5%  perf-profile.children.cycles-pp.update_load_avg
      0.35 ±  2%      -0.1        0.29 ±  2%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.42 ±  3%      -0.1        0.37 ±  2%  perf-profile.children.cycles-pp.enqueue_entity
      0.11 ±  6%      -0.1        0.06 ±  8%  perf-profile.children.cycles-pp.finish_task_switch
      0.18 ±  5%      -0.0        0.13 ±  5%  perf-profile.children.cycles-pp.available_idle_cpu
      0.33            -0.0        0.28        perf-profile.children.cycles-pp.xlog_write
      0.12 ±  3%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.xlog_write_partial
      0.30 ±  3%      -0.0        0.25 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.12 ±  4%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.xlog_write_get_more_iclog_space
      0.37 ±  5%      -0.0        0.32 ±  8%  perf-profile.children.cycles-pp.dequeue_entity
      0.08            -0.0        0.03 ± 70%  perf-profile.children.cycles-pp.__cond_resched
      0.46            -0.0        0.41        perf-profile.children.cycles-pp.xlog_cil_push_work
      0.27 ±  3%      -0.0        0.23 ±  3%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.08 ±  6%      -0.0        0.04 ± 44%  perf-profile.children.cycles-pp.select_idle_core
      0.26 ±  2%      -0.0        0.22 ±  3%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.12 ±  3%      -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.queue_work_on
      0.14 ±  3%      -0.0        0.12 ±  6%  perf-profile.children.cycles-pp.prepare_task_switch
      0.12 ±  3%      -0.0        0.09        perf-profile.children.cycles-pp.ttwu_queue_wakelist
      0.26 ±  5%      -0.0        0.23 ±  6%  perf-profile.children.cycles-pp.update_curr
      0.12            -0.0        0.10 ±  5%  perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
      0.13 ±  3%      -0.0        0.11        perf-profile.children.cycles-pp.wake_affine
      0.08 ±  4%      -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.set_next_entity
      0.10 ±  5%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp.kick_pool
      0.11 ±  4%      -0.0        0.09 ±  4%  perf-profile.children.cycles-pp.__queue_work
      0.10 ±  3%      -0.0        0.08 ±  4%  perf-profile.children.cycles-pp.__switch_to_asm
      0.10 ±  4%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      0.07            -0.0        0.05        perf-profile.children.cycles-pp.__smp_call_single_queue
      0.11            -0.0        0.09        perf-profile.children.cycles-pp.xlog_cil_set_ctx_write_state
      0.10            -0.0        0.08 ±  4%  perf-profile.children.cycles-pp.task_h_load
      0.08 ±  4%      -0.0        0.06        perf-profile.children.cycles-pp.sched_mm_cid_migrate_to
      0.08 ±  4%      -0.0        0.06        perf-profile.children.cycles-pp.set_task_cpu
      0.07 ±  5%      -0.0        0.05        perf-profile.children.cycles-pp.__switch_to
      0.13 ±  4%      -0.0        0.11 ±  3%  perf-profile.children.cycles-pp.menu_select
      0.13 ±  6%      -0.0        0.11 ±  5%  perf-profile.children.cycles-pp.reweight_entity
      0.11            -0.0        0.09 ±  4%  perf-profile.children.cycles-pp.xlog_cil_write_commit_record
      0.06 ±  6%      -0.0        0.05        perf-profile.children.cycles-pp.___perf_sw_event
      0.08 ±  5%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp.avg_vruntime
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.perf_tp_event
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.place_entity
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.sched_clock
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.rep_movs_alternative
      0.05            +0.0        0.06 ±  6%  perf-profile.children.cycles-pp.kfree
      0.06            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      0.10 ±  3%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.xfs_inode_item_format_data_fork
      0.05            +0.0        0.06 ±  7%  perf-profile.children.cycles-pp.xfs_trans_read_buf_map
      0.06            +0.0        0.07 ±  6%  perf-profile.children.cycles-pp.xfs_btree_lookup_get_block
      0.07 ±  5%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.filemap_get_entry
      0.09 ±  5%      +0.0        0.10 ±  3%  perf-profile.children.cycles-pp.memcpy_orig
      0.12 ±  3%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.xlog_state_clean_iclog
      0.07 ±  5%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.filemap_dirty_folio
      0.07            +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.iomap_set_range_uptodate
      0.07 ±  5%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.writeback_get_folio
      0.07            +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.xfs_end_bio
      0.06 ±  9%      +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.io_schedule
      0.10            +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.xfs_buffered_write_iomap_begin
      0.09            +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.xfs_btree_lookup
      0.10 ±  3%      +0.0        0.12 ±  5%  perf-profile.children.cycles-pp.writeback_iter
      0.09            +0.0        0.11        perf-profile.children.cycles-pp.xfs_trans_committed_bulk
      0.26            +0.0        0.28        perf-profile.children.cycles-pp.flush_workqueue_prep_pwqs
      0.10            +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.__filemap_get_folio
      0.07 ±  7%      +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.folio_wait_bit_common
      0.16 ±  3%      +0.0        0.19 ±  3%  perf-profile.children.cycles-pp.xfs_inode_item_format
      0.08 ±  5%      +0.0        0.11        perf-profile.children.cycles-pp.__filemap_fdatawait_range
      0.07 ±  5%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.wake_page_function
      0.07 ±  7%      +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.folio_wait_writeback
      0.12 ±  4%      +0.0        0.14 ±  2%  perf-profile.children.cycles-pp.iomap_writepage_map_blocks
      0.07 ±  6%      +0.0        0.10 ±  5%  perf-profile.children.cycles-pp.folio_wake_bit
      0.13 ±  2%      +0.0        0.16 ±  2%  perf-profile.children.cycles-pp.llseek
      0.03 ± 70%      +0.0        0.06        perf-profile.children.cycles-pp.get_jiffies_update
      0.12 ±  3%      +0.0        0.15 ±  2%  perf-profile.children.cycles-pp.iomap_iter
      0.14 ±  5%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.__mutex_unlock_slowpath
      0.03 ± 70%      +0.0        0.06 ±  6%  perf-profile.children.cycles-pp.tmigr_requires_handle_remote
      0.04 ± 44%      +0.0        0.07        perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      0.14 ±  2%      +0.0        0.17 ±  4%  perf-profile.children.cycles-pp.iomap_write_end
      0.04 ± 45%      +0.0        0.07 ±  6%  perf-profile.children.cycles-pp.xfs_trans_alloc_inode
      0.03 ± 70%      +0.0        0.06 ±  7%  perf-profile.children.cycles-pp.xfs_map_blocks
      0.15 ±  3%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.iomap_write_begin
      0.11 ±  5%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.wake_up_q
      0.14 ±  3%      +0.0        0.17 ±  3%  perf-profile.children.cycles-pp.xlog_cil_committed
      0.14 ±  3%      +0.0        0.17 ±  2%  perf-profile.children.cycles-pp.xlog_cil_process_committed
      0.03 ± 70%      +0.0        0.07 ±  8%  perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.22            +0.0        0.26 ±  2%  perf-profile.children.cycles-pp.xlog_cil_insert_format_items
      0.15 ±  2%      +0.0        0.19 ±  5%  perf-profile.children.cycles-pp.xfs_bmap_add_extent_unwritten_real
      0.16 ±  2%      +0.0        0.20 ±  5%  perf-profile.children.cycles-pp.xfs_bmapi_convert_unwritten
      0.02 ±141%      +0.0        0.06 ± 13%  perf-profile.children.cycles-pp.xlog_grant_push_threshold
      0.28 ±  4%      +0.0        0.32 ±  2%  perf-profile.children.cycles-pp.update_process_times
      0.15            +0.0        0.19        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.32 ±  3%      +0.0        0.36 ±  3%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.18 ±  2%      +0.0        0.23 ±  4%  perf-profile.children.cycles-pp.xfs_bmapi_write
      0.27 ±  2%      +0.0        0.32        perf-profile.children.cycles-pp.xlog_ioend_work
      0.36 ±  4%      +0.0        0.41 ±  3%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.26 ±  2%      +0.0        0.31        perf-profile.children.cycles-pp.xlog_state_do_callback
      0.26 ±  2%      +0.0        0.31        perf-profile.children.cycles-pp.xlog_state_do_iclog_callbacks
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.xa_load
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.xfs_iext_lookup_extent
      0.02 ±141%      +0.1        0.07 ±  5%  perf-profile.children.cycles-pp.up_write
      0.31 ±  2%      +0.1        0.38 ±  2%  perf-profile.children.cycles-pp.xlog_cil_insert_items
      0.41 ±  4%      +0.1        0.47 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.41 ±  3%      +0.1        0.48 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.13 ± 12%      +0.1        0.20 ±  8%  perf-profile.children.cycles-pp.xfs_log_ticket_ungrant
      0.30            +0.1        0.38 ±  3%  perf-profile.children.cycles-pp.copy_to_brd
      0.56 ±  3%      +0.1        0.64 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.35            +0.1        0.43 ±  3%  perf-profile.children.cycles-pp.brd_submit_bio
      0.95            +0.1        1.04        perf-profile.children.cycles-pp.mutex_spin_on_owner
      0.11 ± 11%      +0.1        0.21 ± 12%  perf-profile.children.cycles-pp.xlog_grant_add_space
      0.44            +0.1        0.55 ±  2%  perf-profile.children.cycles-pp.iomap_write_iter
      0.19 ±  5%      +0.1        0.30 ±  6%  perf-profile.children.cycles-pp.iomap_finish_ioends
      0.21 ± 11%      +0.1        0.35 ± 12%  perf-profile.children.cycles-pp.xfs_log_reserve
      0.22 ± 11%      +0.1        0.36 ± 11%  perf-profile.children.cycles-pp.xfs_trans_reserve
      0.40 ±  2%      +0.1        0.54 ±  2%  perf-profile.children.cycles-pp.xfs_iomap_write_unwritten
      0.57            +0.1        0.71 ±  2%  perf-profile.children.cycles-pp.iomap_file_buffered_write
      0.25 ± 10%      +0.1        0.39 ± 10%  perf-profile.children.cycles-pp.xfs_trans_alloc
      0.13 ± 11%      +0.2        0.32 ± 16%  perf-profile.children.cycles-pp.schedule_preempt_disabled
      0.23 ± 13%      +0.2        0.46 ± 12%  perf-profile.children.cycles-pp.sb_mark_inode_writeback
      0.25 ± 12%      +0.2        0.50 ± 12%  perf-profile.children.cycles-pp.sb_clear_inode_writeback
      0.59 ±  2%      +0.3        0.85 ±  2%  perf-profile.children.cycles-pp.xfs_end_io
      0.58 ±  2%      +0.3        0.84 ±  3%  perf-profile.children.cycles-pp.xfs_end_ioend
      0.46 ±  6%      +0.3        0.72 ±  6%  perf-profile.children.cycles-pp.md_end_clone_io
      0.30 ± 10%      +0.3        0.57 ±  9%  perf-profile.children.cycles-pp.__folio_start_writeback
      0.11 ± 11%      +0.3        0.38 ± 13%  perf-profile.children.cycles-pp.rwsem_down_read_slowpath
      0.43 ±  7%      +0.3        0.72 ±  7%  perf-profile.children.cycles-pp.iomap_writepage_map
      0.16 ±  9%      +0.3        0.46 ± 11%  perf-profile.children.cycles-pp.down_read
      0.44 ±  8%      +0.3        0.76 ±  7%  perf-profile.children.cycles-pp.__folio_end_writeback
      0.52 ±  7%      +0.4        0.88 ±  6%  perf-profile.children.cycles-pp.folio_end_writeback
      0.54 ±  7%      +0.4        0.90 ±  6%  perf-profile.children.cycles-pp.iomap_finish_ioend
      0.92 ±  2%      +0.4        1.30 ±  3%  perf-profile.children.cycles-pp.iomap_submit_ioend
      0.72 ±  3%      +0.4        1.16 ±  5%  perf-profile.children.cycles-pp.xlog_cil_commit
      0.82 ±  3%      +0.5        1.28 ±  5%  perf-profile.children.cycles-pp.__xfs_trans_commit
      0.92 ±  4%      +0.5        1.43 ±  6%  perf-profile.children.cycles-pp.xfs_vn_update_time
      0.94 ±  4%      +0.5        1.46 ±  6%  perf-profile.children.cycles-pp.kiocb_modified
      0.96 ±  4%      +0.5        1.48 ±  6%  perf-profile.children.cycles-pp.xfs_file_write_checks
      6.96 ±  2%      +0.5        7.49 ±  3%  perf-profile.children.cycles-pp.intel_idle
      1.45 ±  4%      +0.7        2.15 ±  5%  perf-profile.children.cycles-pp.iomap_writepages
      1.46 ±  4%      +0.7        2.16 ±  4%  perf-profile.children.cycles-pp.xfs_vm_writepages
      1.48 ±  4%      +0.7        2.18 ±  4%  perf-profile.children.cycles-pp.do_writepages
      1.51 ±  4%      +0.7        2.22 ±  4%  perf-profile.children.cycles-pp.filemap_fdatawrite_wbc
      1.51 ±  3%      +0.7        2.23 ±  4%  perf-profile.children.cycles-pp.__filemap_fdatawrite_range
      1.61 ±  3%      +0.8        2.36 ±  4%  perf-profile.children.cycles-pp.file_write_and_wait_range
     85.48            +0.8       86.24        perf-profile.children.cycles-pp.xfs_file_fsync
     87.06            +1.4       88.49        perf-profile.children.cycles-pp.xfs_file_buffered_write
     87.19            +1.5       88.65        perf-profile.children.cycles-pp.vfs_write
     87.20            +1.5       88.66        perf-profile.children.cycles-pp.ksys_write
     87.66            +1.5       89.14        perf-profile.children.cycles-pp.write
     87.50            +1.5       88.98        perf-profile.children.cycles-pp.do_syscall_64
     87.50            +1.5       88.99        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     56.76           +13.7       70.44        perf-profile.children.cycles-pp.osq_lock
     57.89           +13.9       71.74        perf-profile.children.cycles-pp.__mutex_lock
     60.36           +14.6       74.96        perf-profile.children.cycles-pp.__flush_workqueue
     61.49           +14.6       76.10        perf-profile.children.cycles-pp.xlog_cil_push_now
     68.74           +14.8       83.60        perf-profile.children.cycles-pp.xfs_log_force_seq
     64.98           +15.1       80.03        perf-profile.children.cycles-pp.xlog_cil_force_seq
     22.30 ±  3%     -13.1        9.22 ±  4%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      1.91 ±  9%      -1.4        0.46 ±  5%  perf-profile.self.cycles-pp.intel_idle_irq
      0.24 ±  2%      -0.1        0.18 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.18 ±  4%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.available_idle_cpu
      0.37 ±  2%      -0.0        0.32 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.20 ±  3%      -0.0        0.17 ±  4%  perf-profile.self.cycles-pp.update_load_avg
      0.14 ±  3%      -0.0        0.11 ±  3%  perf-profile.self.cycles-pp.__schedule
      0.09 ±  4%      -0.0        0.07 ±  8%  perf-profile.self.cycles-pp.prepare_task_switch
      0.10            -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.task_h_load
      0.10 ±  5%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.__switch_to_asm
      0.08 ±  4%      -0.0        0.06        perf-profile.self.cycles-pp.sched_mm_cid_migrate_to
      0.07 ±  5%      -0.0        0.05 ±  7%  perf-profile.self.cycles-pp.menu_select
      0.09 ±  5%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.switch_mm_irqs_off
      0.06 ±  7%      -0.0        0.05        perf-profile.self.cycles-pp.__switch_to
      0.07 ±  7%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.enqueue_entity
      0.10 ±  4%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.update_curr
      0.05            +0.0        0.06        perf-profile.self.cycles-pp.rep_movs_alternative
      0.06            +0.0        0.07 ±  5%  perf-profile.self.cycles-pp.xas_load
      0.08 ±  4%      +0.0        0.10 ±  5%  perf-profile.self.cycles-pp.__flush_workqueue
      0.07            +0.0        0.08 ±  5%  perf-profile.self.cycles-pp.iomap_set_range_uptodate
      0.08 ±  5%      +0.0        0.10 ±  3%  perf-profile.self.cycles-pp.memcpy_orig
      0.05 ±  7%      +0.0        0.07 ±  5%  perf-profile.self.cycles-pp.down_read
      0.08 ±  5%      +0.0        0.11 ±  4%  perf-profile.self.cycles-pp.__mutex_lock
      0.09 ±  4%      +0.0        0.12 ±  6%  perf-profile.self.cycles-pp.xlog_cil_insert_items
      0.03 ± 70%      +0.0        0.06        perf-profile.self.cycles-pp.get_jiffies_update
      0.02 ± 99%      +0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__folio_end_writeback
      0.15            +0.0        0.19        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.10 ± 12%      +0.1        0.16 ±  9%  perf-profile.self.cycles-pp.xfs_log_ticket_ungrant
      0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.30 ±  2%      +0.1        0.37 ±  2%  perf-profile.self.cycles-pp.copy_to_brd
      0.95            +0.1        1.03        perf-profile.self.cycles-pp.mutex_spin_on_owner
      0.11 ± 11%      +0.1        0.20 ± 14%  perf-profile.self.cycles-pp.xlog_grant_add_space
      6.96 ±  2%      +0.5        7.49 ±  3%  perf-profile.self.cycles-pp.intel_idle
     56.27           +13.5       69.81        perf-profile.self.cycles-pp.osq_lock




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [axboe-block:for-next] [block]  1122c0c1cc:  aim7.jobs-per-min 22.6% improvement
  2024-06-25  2:28 [axboe-block:for-next] [block] 1122c0c1cc: aim7.jobs-per-min 22.6% improvement kernel test robot
@ 2024-06-25  8:57 ` Christoph Hellwig
  2024-06-26  2:10   ` Oliver Sang
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2024-06-25  8:57 UTC (permalink / raw)
  To: kernel test robot
  Cc: Christoph Hellwig, oe-lkp, lkp, Jens Axboe, Ulf Hansson,
	Damien Le Moal, Hannes Reinecke, linux-block, linux-um, drbd-dev,
	nbd, linuxppc-dev, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-scsi, ying.huang, feng.tang, fengwei.yin

Hi Oliver,

can you test the patch below?  It restores the previous behavior if
the device did not have a volatile write cache.  I think at least
for raid0 and raid1 without bitmap the new behavior actually is correct
and better, but it will need fixes for other modes.  If the underlying
devices did have a volatile write cache I'm a bit lost what the problem
was and this probably won't fix the issue.

---
From 81c816827197f811e14add7a79220ed9eef6af02 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 25 Jun 2024 08:48:18 +0200
Subject: md: set md-specific flags for all queue limits

The md driver wants to enforce a number of flags to an all devices, even
when not inheriting them from the underlying devices.  To make sure these
flags survive the queue_limits_set calls that md uses to update the
queue limits without deriving them form the previous limits add a new
md_init_stacking_limits helper that calls blk_set_stacking_limits and sets
these flags.

Fixes: 1122c0c1cc71 ("block: move cache control settings out of queue->flags")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c     | 13 ++++++++-----
 drivers/md/md.h     |  1 +
 drivers/md/raid0.c  |  2 +-
 drivers/md/raid1.c  |  2 +-
 drivers/md/raid10.c |  2 +-
 drivers/md/raid5.c  |  2 +-
 6 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 69ea54aedd99a1..8368438e58e989 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5853,6 +5853,13 @@ static void mddev_delayed_delete(struct work_struct *ws)
 	kobject_put(&mddev->kobj);
 }
 
+void md_init_stacking_limits(struct queue_limits *lim)
+{
+	blk_set_stacking_limits(lim);
+	lim->features = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
+			BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
+}
+
 struct mddev *md_alloc(dev_t dev, char *name)
 {
 	/*
@@ -5871,10 +5878,6 @@ struct mddev *md_alloc(dev_t dev, char *name)
 	int shift;
 	int unit;
 	int error;
-	struct queue_limits lim = {
-		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
-					  BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT,
-	};
 
 	/*
 	 * Wait for any previous instance of this device to be completely
@@ -5914,7 +5917,7 @@ struct mddev *md_alloc(dev_t dev, char *name)
 		 */
 		mddev->hold_active = UNTIL_STOP;
 
-	disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+	disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
 	if (IS_ERR(disk)) {
 		error = PTR_ERR(disk);
 		goto out_free_mddev;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index c4d7ebf9587d07..28cb4b0b6c1740 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -893,6 +893,7 @@ extern int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale);
 
 extern int mddev_init(struct mddev *mddev);
 extern void mddev_destroy(struct mddev *mddev);
+void md_init_stacking_limits(struct queue_limits *lim);
 struct mddev *md_alloc(dev_t dev, char *name);
 void mddev_put(struct mddev *mddev);
 extern int md_run(struct mddev *mddev);
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 62634e2a33bd0f..32d58752477847 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -379,7 +379,7 @@ static int raid0_set_limits(struct mddev *mddev)
 	struct queue_limits lim;
 	int err;
 
-	blk_set_stacking_limits(&lim);
+	md_init_stacking_limits(&lim);
 	lim.max_hw_sectors = mddev->chunk_sectors;
 	lim.max_write_zeroes_sectors = mddev->chunk_sectors;
 	lim.io_min = mddev->chunk_sectors << 9;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 1a0eba65b8a92b..04a0c2ca173245 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3194,7 +3194,7 @@ static int raid1_set_limits(struct mddev *mddev)
 	struct queue_limits lim;
 	int err;
 
-	blk_set_stacking_limits(&lim);
+	md_init_stacking_limits(&lim);
 	lim.max_write_zeroes_sectors = 0;
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err) {
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3334aa803c8380..2a9c4ee982e023 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3974,7 +3974,7 @@ static int raid10_set_queue_limits(struct mddev *mddev)
 	struct queue_limits lim;
 	int err;
 
-	blk_set_stacking_limits(&lim);
+	md_init_stacking_limits(&lim);
 	lim.max_write_zeroes_sectors = 0;
 	lim.io_min = mddev->chunk_sectors << 9;
 	lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0192a6323f09ba..10219205160bbf 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7708,7 +7708,7 @@ static int raid5_set_limits(struct mddev *mddev)
 	 */
 	stripe = roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9));
 
-	blk_set_stacking_limits(&lim);
+	md_init_stacking_limits(&lim);
 	lim.io_min = mddev->chunk_sectors << 9;
 	lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
 	lim.features |= BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [axboe-block:for-next] [block]  1122c0c1cc:  aim7.jobs-per-min 22.6% improvement
  2024-06-25  8:57 ` Christoph Hellwig
@ 2024-06-26  2:10   ` Oliver Sang
  2024-06-26  3:39     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Sang @ 2024-06-26  2:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, oe-lkp, lkp, Jens Axboe, Ulf Hansson,
	Damien Le Moal, Hannes Reinecke, linux-block, linux-um, drbd-dev,
	nbd, linuxppc-dev, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-scsi, ying.huang, feng.tang, fengwei.yin, oliver.sang

hi, Christoph Hellwig,

On Tue, Jun 25, 2024 at 01:57:35AM -0700, Christoph Hellwig wrote:
> Hi Oliver,
> 
> can you test the patch below?  It restores the previous behavior if
> the device did not have a volatile write cache.  I think at least
> for raid0 and raid1 without bitmap the new behavior actually is correct
> and better, but it will need fixes for other modes.  If the underlying
> devices did have a volatile write cache I'm a bit lost what the problem
> was and this probably won't fix the issue.

I'm not sure I understand this test request. as in title, we see a good
improvement of aim7 for 1122c0c1cc, and we didn't observe other issues for
this commit.

do you mean this improvement is not expected or exposes some problems instead?
then by below patch, should the performance back to the level of parent of
1122c0c1cc?

sure! it's our great pleasure to test your patches. I noticed there are
[1]
https://lore.kernel.org/all/20240625110603.50885-2-hch@lst.de/
which includes "[PATCH 1/7] md: set md-specific flags for all queue limits"
[2]
https://lore.kernel.org/all/20240625145955.115252-2-hch@lst.de/
which includes "[PATCH 1/8] md: set md-specific flags for all queue limits"

which one you suggest us to test?
do we only need to apply the first patch "md: set md-specific flags for all queue limits"
upon 1122c0c1cc?
then is the expectation the performance back to parent of 1122c0c1cc?

thanks

> 
> ---
> From 81c816827197f811e14add7a79220ed9eef6af02 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Tue, 25 Jun 2024 08:48:18 +0200
> Subject: md: set md-specific flags for all queue limits
> 
> The md driver wants to enforce a number of flags to an all devices, even
> when not inheriting them from the underlying devices.  To make sure these
> flags survive the queue_limits_set calls that md uses to update the
> queue limits without deriving them form the previous limits add a new
> md_init_stacking_limits helper that calls blk_set_stacking_limits and sets
> these flags.
> 
> Fixes: 1122c0c1cc71 ("block: move cache control settings out of queue->flags")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/md/md.c     | 13 ++++++++-----
>  drivers/md/md.h     |  1 +
>  drivers/md/raid0.c  |  2 +-
>  drivers/md/raid1.c  |  2 +-
>  drivers/md/raid10.c |  2 +-
>  drivers/md/raid5.c  |  2 +-
>  6 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 69ea54aedd99a1..8368438e58e989 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5853,6 +5853,13 @@ static void mddev_delayed_delete(struct work_struct *ws)
>  	kobject_put(&mddev->kobj);
>  }
>  
> +void md_init_stacking_limits(struct queue_limits *lim)
> +{
> +	blk_set_stacking_limits(lim);
> +	lim->features = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
> +			BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
> +}
> +
>  struct mddev *md_alloc(dev_t dev, char *name)
>  {
>  	/*
> @@ -5871,10 +5878,6 @@ struct mddev *md_alloc(dev_t dev, char *name)
>  	int shift;
>  	int unit;
>  	int error;
> -	struct queue_limits lim = {
> -		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
> -					  BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT,
> -	};
>  
>  	/*
>  	 * Wait for any previous instance of this device to be completely
> @@ -5914,7 +5917,7 @@ struct mddev *md_alloc(dev_t dev, char *name)
>  		 */
>  		mddev->hold_active = UNTIL_STOP;
>  
> -	disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
> +	disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
>  	if (IS_ERR(disk)) {
>  		error = PTR_ERR(disk);
>  		goto out_free_mddev;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index c4d7ebf9587d07..28cb4b0b6c1740 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -893,6 +893,7 @@ extern int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale);
>  
>  extern int mddev_init(struct mddev *mddev);
>  extern void mddev_destroy(struct mddev *mddev);
> +void md_init_stacking_limits(struct queue_limits *lim);
>  struct mddev *md_alloc(dev_t dev, char *name);
>  void mddev_put(struct mddev *mddev);
>  extern int md_run(struct mddev *mddev);
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 62634e2a33bd0f..32d58752477847 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -379,7 +379,7 @@ static int raid0_set_limits(struct mddev *mddev)
>  	struct queue_limits lim;
>  	int err;
>  
> -	blk_set_stacking_limits(&lim);
> +	md_init_stacking_limits(&lim);
>  	lim.max_hw_sectors = mddev->chunk_sectors;
>  	lim.max_write_zeroes_sectors = mddev->chunk_sectors;
>  	lim.io_min = mddev->chunk_sectors << 9;
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 1a0eba65b8a92b..04a0c2ca173245 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -3194,7 +3194,7 @@ static int raid1_set_limits(struct mddev *mddev)
>  	struct queue_limits lim;
>  	int err;
>  
> -	blk_set_stacking_limits(&lim);
> +	md_init_stacking_limits(&lim);
>  	lim.max_write_zeroes_sectors = 0;
>  	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
>  	if (err) {
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 3334aa803c8380..2a9c4ee982e023 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3974,7 +3974,7 @@ static int raid10_set_queue_limits(struct mddev *mddev)
>  	struct queue_limits lim;
>  	int err;
>  
> -	blk_set_stacking_limits(&lim);
> +	md_init_stacking_limits(&lim);
>  	lim.max_write_zeroes_sectors = 0;
>  	lim.io_min = mddev->chunk_sectors << 9;
>  	lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 0192a6323f09ba..10219205160bbf 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7708,7 +7708,7 @@ static int raid5_set_limits(struct mddev *mddev)
>  	 */
>  	stripe = roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9));
>  
> -	blk_set_stacking_limits(&lim);
> +	md_init_stacking_limits(&lim);
>  	lim.io_min = mddev->chunk_sectors << 9;
>  	lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
>  	lim.features |= BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [axboe-block:for-next] [block]  1122c0c1cc:  aim7.jobs-per-min 22.6% improvement
  2024-06-26  2:10   ` Oliver Sang
@ 2024-06-26  3:39     ` Christoph Hellwig
  2024-06-27  2:35       ` Oliver Sang
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2024-06-26  3:39 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Christoph Hellwig, Christoph Hellwig, oe-lkp, lkp, Jens Axboe,
	Ulf Hansson, Damien Le Moal, Hannes Reinecke, linux-block,
	linux-um, drbd-dev, nbd, linuxppc-dev, virtualization, xen-devel,
	linux-bcache, dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm,
	linux-nvme, linux-scsi, ying.huang, feng.tang, fengwei.yin

On Wed, Jun 26, 2024 at 10:10:49AM +0800, Oliver Sang wrote:
> I'm not sure I understand this test request. as in title, we see a good
> improvement of aim7 for 1122c0c1cc, and we didn't observe other issues for
> this commit.

The improvement suggests we are not sending cache flushes when we should
send them, or at least just handle them in md.

> do you mean this improvement is not expected or exposes some problems instead?
> then by below patch, should the performance back to the level of parent of
> 1122c0c1cc?
> 
> sure! it's our great pleasure to test your patches. I noticed there are
> [1]
> https://lore.kernel.org/all/20240625110603.50885-2-hch@lst.de/
> which includes "[PATCH 1/7] md: set md-specific flags for all queue limits"
> [2]
> https://lore.kernel.org/all/20240625145955.115252-2-hch@lst.de/
> which includes "[PATCH 1/8] md: set md-specific flags for all queue limits"
> 
> which one you suggest us to test?
> do we only need to apply the first patch "md: set md-specific flags for all queue limits"
> upon 1122c0c1cc?
> then is the expectation the performance back to parent of 1122c0c1cc?

Either just the patch in reply or the entire [2] series would be fine.

Thanks!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [axboe-block:for-next] [block]  1122c0c1cc:  aim7.jobs-per-min 22.6% improvement
  2024-06-26  3:39     ` Christoph Hellwig
@ 2024-06-27  2:35       ` Oliver Sang
  2024-06-27  4:54         ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Sang @ 2024-06-27  2:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, oe-lkp, lkp, Jens Axboe, Ulf Hansson,
	Damien Le Moal, Hannes Reinecke, linux-block, linux-um, drbd-dev,
	nbd, linuxppc-dev, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-scsi, ying.huang, feng.tang, fengwei.yin, oliver.sang

hi, Christoph Hellwig,

On Tue, Jun 25, 2024 at 08:39:50PM -0700, Christoph Hellwig wrote:
> On Wed, Jun 26, 2024 at 10:10:49AM +0800, Oliver Sang wrote:
> > I'm not sure I understand this test request. as in title, we see a good
> > improvement of aim7 for 1122c0c1cc, and we didn't observe other issues for
> > this commit.
> 
> The improvement suggests we are not sending cache flushes when we should
> send them, or at least just handle them in md.

thanks for explanation!

> 
> > do you mean this improvement is not expected or exposes some problems instead?
> > then by below patch, should the performance back to the level of parent of
> > 1122c0c1cc?
> > 
> > sure! it's our great pleasure to test your patches. I noticed there are
> > [1]
> > https://lore.kernel.org/all/20240625110603.50885-2-hch@lst.de/
> > which includes "[PATCH 1/7] md: set md-specific flags for all queue limits"
> > [2]
> > https://lore.kernel.org/all/20240625145955.115252-2-hch@lst.de/
> > which includes "[PATCH 1/8] md: set md-specific flags for all queue limits"
> > 
> > which one you suggest us to test?
> > do we only need to apply the first patch "md: set md-specific flags for all queue limits"
> > upon 1122c0c1cc?
> > then is the expectation the performance back to parent of 1122c0c1cc?
> 
> Either just the patch in reply or the entire [2] series would be fine.

I failed to apply patch in your previous reply to 1122c0c1cc or current tip
of axboe-block/for-next:
c1440ed442a58 (axboe-block/for-next) Merge branch 'for-6.11/block' into for-next

but it's ok to apply upon next:
* 0fc4bfab2cd45 (tag: next-20240625) Add linux-next specific files for 20240625

I've already started the test based on this applyment.
is the expectation that patch should not introduce performance change comparing
to 0fc4bfab2cd45?

or if this applyment is not ok, please just give me guidance. Thanks!


> 
> Thanks!
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [axboe-block:for-next] [block]  1122c0c1cc:  aim7.jobs-per-min 22.6% improvement
  2024-06-27  2:35       ` Oliver Sang
@ 2024-06-27  4:54         ` Christoph Hellwig
  2024-07-01  8:22           ` Oliver Sang
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2024-06-27  4:54 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Christoph Hellwig, oe-lkp, lkp, Jens Axboe, Ulf Hansson,
	Damien Le Moal, Hannes Reinecke, linux-block, linux-um, drbd-dev,
	nbd, linuxppc-dev, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-scsi, ying.huang, feng.tang, fengwei.yin

On Thu, Jun 27, 2024 at 10:35:38AM +0800, Oliver Sang wrote:
> 
> I failed to apply patch in your previous reply to 1122c0c1cc or current tip
> of axboe-block/for-next:
> c1440ed442a58 (axboe-block/for-next) Merge branch 'for-6.11/block' into for-next

That already includes it.

> 
> but it's ok to apply upon next:
> * 0fc4bfab2cd45 (tag: next-20240625) Add linux-next specific files for 20240625
> 
> I've already started the test based on this applyment.
> is the expectation that patch should not introduce performance change comparing
> to 0fc4bfab2cd45?
> 
> or if this applyment is not ok, please just give me guidance. Thanks!

The expectation is that the latest block branch (and thus linux-next)
doesn't see this performance change.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [axboe-block:for-next] [block]  1122c0c1cc:  aim7.jobs-per-min 22.6% improvement
  2024-06-27  4:54         ` Christoph Hellwig
@ 2024-07-01  8:22           ` Oliver Sang
  2024-07-02  7:32             ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Sang @ 2024-07-01  8:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: oe-lkp, lkp, Jens Axboe, Ulf Hansson, Damien Le Moal,
	Hannes Reinecke, linux-block, linux-um, drbd-dev, nbd,
	linuxppc-dev, virtualization, xen-devel, linux-bcache, dm-devel,
	linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme, linux-scsi,
	ying.huang, feng.tang, fengwei.yin, oliver.sang

hi, Christoph Hellwig,

On Wed, Jun 26, 2024 at 09:54:05PM -0700, Christoph Hellwig wrote:
> On Thu, Jun 27, 2024 at 10:35:38AM +0800, Oliver Sang wrote:
> > 
> > I failed to apply patch in your previous reply to 1122c0c1cc or current tip
> > of axboe-block/for-next:
> > c1440ed442a58 (axboe-block/for-next) Merge branch 'for-6.11/block' into for-next
> 
> That already includes it.

for the patch in your previous reply [1]
the bot applied it automatically as:
* 5c683739f6c2f patch in [1]
* 0fc4bfab2cd45 (tag: next-20240625) Add linux-next specific files for 20240625

for patch set [2], the bot applied it as:
* 6490f979767736 block: move dma_pad_mask into queue_limits
* 278817f42e219b block: remove the fallback case in queue_dma_alignment
* 81afb19d619a04 block: remove disk_update_readahead
* 037d85402b8b83 block: conding style fixup for blk_queue_max_guaranteed_bio
* 4fe67425ae31a8 block: convert features and flags to __bitwise types
* e3c2d2ad4136f2 block: rename BLK_FLAG_MISALIGNED
* 33ead159243d1c block: correctly report cache type
* 6725109120e0ba md: set md-specific flags for all queue limits
*   e6d130064a02f5 Merge branch 'for-6.11/block' into for-next


but both build failed with the error:
  - "ERROR: modpost: \"md_init_stacking_limits\" [drivers/md/raid456.ko] undefined!"
  - "ERROR: modpost: \"md_init_stacking_limits\" [drivers/md/raid1.ko] undefined!"
  - "ERROR: modpost: \"md_init_stacking_limits\" [drivers/md/raid0.ko] undefined!"
  - "ERROR: modpost: \"md_init_stacking_limits\" [drivers/md/raid10.ko] undefined!"


since you mentioned the axboe-block/for-next branch has already includes the
patch-set, I got a snapshot of the branch as below several days ago:

*   bc512ae8cb934 (axboe-block/for-next) Merge branch 'for-6.11/block' into for-next   <-----------
|\
| * 18048c1af7836 (axboe-block/for-6.11/block) loop: Fix a race between loop detach and loop open
| * 63db4a1f795a1 block: Delete blk_queue_flag_test_and_set()
* | e21d05740862c Merge branch 'for-6.11/block' into for-next
|\|
| * e269537e491da block: clean up the check in blkdev_iomap_begin()
* | 9c6e1f8702d51 Merge branch 'for-6.11/block' into for-next
|\|
| * 69b6517687a4b block: use the right type for stub rq_integrity_vec()
* | c1440ed442a58 Merge branch 'for-6.11/block' into for-next
|\|
| * e94b45d08b5d1 block: move dma_pad_mask into queue_limits          <----------------
| * abfc9d810926d block: remove the fallback case in queue_dma_alignment
| * 73781b3b81e76 block: remove disk_update_readahead
| * 3302f6f090522 block: conding style fixup for blk_queue_max_guaranteed_bio
| * fcf865e357f80 block: convert features and flags to __bitwise types
| * ec9b1cf0b0ebf block: rename BLK_FEAT_MISALIGNED
| * 78887d004fb2b block: correctly report cache type
| * 573d5abf3df00 md: set md-specific flags for all queue limits       <----------------
* | 72e9cd924fccc Merge branch 'for-6.11/block' into for-next
|\|
| * cf546dd289e0f block: change rq_integrity_vec to respect the iterator  <-------------

from below, it seems the patchset doesn't introduce any performance improvement
but a regression now. is this expected?

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
  gcc-13/performance/4BRD_12G/xfs/x86_64-rhel-8.3/300/RAID0/debian-12-x86_64-20240206.cgz/lkp-csl-2sp3/sync_disk_rw/aim7

cf546dd289e0f6d2 573d5abf3df00c879fbd25774e4 e94b45d08b5d1c230c0f59c3eed bc512ae8cb934ac31470bc825fa
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
     21493           -19.6%      17278           -19.2%      17371           -19.7%      17264        aim7.jobs-per-min



[1] https://lore.kernel.org/all/ZnqGf49cvy6W-xWf@infradead.org/
[2] https://lore.kernel.org/all/20240625145955.115252-2-hch@lst.de/

> 
> > 
> > but it's ok to apply upon next:
> > * 0fc4bfab2cd45 (tag: next-20240625) Add linux-next specific files for 20240625
> > 
> > I've already started the test based on this applyment.
> > is the expectation that patch should not introduce performance change comparing
> > to 0fc4bfab2cd45?
> > 
> > or if this applyment is not ok, please just give me guidance. Thanks!
> 
> The expectation is that the latest block branch (and thus linux-next)
> doesn't see this performance change.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [axboe-block:for-next] [block]  1122c0c1cc:  aim7.jobs-per-min 22.6% improvement
  2024-07-01  8:22           ` Oliver Sang
@ 2024-07-02  7:32             ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-07-02  7:32 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Christoph Hellwig, oe-lkp, lkp, Jens Axboe, Ulf Hansson,
	Damien Le Moal, Hannes Reinecke, linux-block, linux-um, drbd-dev,
	nbd, linuxppc-dev, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-scsi, ying.huang, feng.tang, fengwei.yin

On Mon, Jul 01, 2024 at 04:22:19PM +0800, Oliver Sang wrote:
> from below, it seems the patchset doesn't introduce any performance improvement
> but a regression now. is this expected?

Not having the improvement at least alleviate my concerns about data
integrity.  I'm still curious where it comes from as it isn't exactly
expected.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-07-02  7:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-25  2:28 [axboe-block:for-next] [block] 1122c0c1cc: aim7.jobs-per-min 22.6% improvement kernel test robot
2024-06-25  8:57 ` Christoph Hellwig
2024-06-26  2:10   ` Oliver Sang
2024-06-26  3:39     ` Christoph Hellwig
2024-06-27  2:35       ` Oliver Sang
2024-06-27  4:54         ` Christoph Hellwig
2024-07-01  8:22           ` Oliver Sang
2024-07-02  7:32             ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).