All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Qu Wenruo <wqu@suse.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-btrfs@vger.kernel.org>, <ying.huang@intel.com>,
	<feng.tang@intel.com>, <fengwei.yin@intel.com>,
	<oliver.sang@intel.com>
Subject: Re: [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory
Date: Wed, 6 Sep 2023 10:45:10 +0800	[thread overview]
Message-ID: <202309061050.19c12499-oliver.sang@intel.com> (raw)
In-Reply-To: <8bc15bfdaa2805d1d1b660b8b2e07a55aa02027d.1692858397.git.wqu@suse.com>



Hello,

kernel test robot noticed a 12.0% improvement of filebench.sum_operations/s on:


commit: 2fa4ac9754a7fa77bad88aae11ac77ba137d3858 ("[PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory")
url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-warn-on-tree-blocks-which-are-not-nodesize-aligned/20230824-143628
base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/all/8bc15bfdaa2805d1d1b660b8b2e07a55aa02027d.1692858397.git.wqu@suse.com/
patch subject: [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory

testcase: filebench
test machine: 96 threads 2 sockets (Ice Lake) with 128G memory
parameters:

	disk: 1HDD
	fs: btrfs
	fs2: cifs
	test: webproxy.f
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230906/202309061050.19c12499-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase:
  gcc-12/performance/1HDD/cifs/btrfs/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/webproxy.f/filebench

commit: 
  19e81514b8 ("btrfs: map uncontinuous extent buffer pages into virtual address space")
  2fa4ac9754 ("btrfs: utilize the physically/virtually continuous extent buffer memory")

19e81514b8c09202 2fa4ac9754a7fa77bad88aae11a 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     30592 ±194%     -92.3%       2343 ± 24%  sched_debug.cpu.avg_idle.min
      1.38            -5.9%       1.30        iostat.cpu.iowait
      4.63            +8.9%       5.04        iostat.cpu.system
      2.56            +0.5        3.09        mpstat.cpu.all.sys%
      0.54            +0.1        0.61        mpstat.cpu.all.usr%
      1996            +3.3%       2062        vmstat.io.bo
     33480           +13.5%      37993        vmstat.system.cs
    152.67           +12.6%     171.83        turbostat.Avg_MHz
      2562            +4.2%       2670        turbostat.Bzy_MHz
      5.34            +0.5        5.83        turbostat.C1E%
      7.12 ± 12%     -21.6%       5.58 ± 12%  turbostat.Pkg%pc2
    209.72            +1.5%     212.81        turbostat.PkgWatt
      4.92 ± 24%      +3.5        8.37 ± 32%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      5.13 ± 28%      +3.6        8.68 ± 31%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      5.13 ± 28%      +3.8        8.90 ± 30%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      5.13 ± 28%      +3.8        8.90 ± 30%  perf-profile.children.cycles-pp.cpuidle_enter
      5.13 ± 28%      +3.8        8.90 ± 30%  perf-profile.children.cycles-pp.cpuidle_enter_state
      5.34 ± 34%      +3.9        9.21 ± 28%  perf-profile.children.cycles-pp.cpuidle_idle_call
     13.90            +9.6%      15.23        filebench.sum_bytes_mb/s
    238030           +12.0%     266575        filebench.sum_operations
      3966           +12.0%       4442        filebench.sum_operations/s
      1043           +12.0%       1168        filebench.sum_reads/s
     25.14           -10.7%      22.46        filebench.sum_time_ms/op
    208.83           +11.9%     233.67        filebench.sum_writes/s
    506705            +5.8%     536097        filebench.time.file_system_outputs
      1597 ±  5%     -36.1%       1020 ±  3%  filebench.time.involuntary_context_switches
     61810 ±  2%      +6.0%      65519        filebench.time.minor_page_faults
    157.67 ±  2%     +31.5%     207.33        filebench.time.percent_of_cpu_this_job_got
    117.60 ±  2%     +27.1%     149.48        filebench.time.system_time
    375177           +10.3%     413862        filebench.time.voluntary_context_switches
     18717            +6.5%      19942        proc-vmstat.nr_active_anon
     20206            +1.2%      20445        proc-vmstat.nr_active_file
    298911            +2.2%     305406        proc-vmstat.nr_anon_pages
    132893            +5.6%     140397        proc-vmstat.nr_dirtied
    313040            +2.0%     319443        proc-vmstat.nr_inactive_anon
     32910            +3.4%      34035        proc-vmstat.nr_shmem
     62503            +1.4%      63367        proc-vmstat.nr_slab_unreclaimable
     99471            +3.7%     103159        proc-vmstat.nr_written
     18717            +6.5%      19942        proc-vmstat.nr_zone_active_anon
     20206            +1.2%      20445        proc-vmstat.nr_zone_active_file
    313040            +2.0%     319443        proc-vmstat.nr_zone_inactive_anon
    943632            +3.2%     974142        proc-vmstat.numa_hit
    841654            +3.6%     871757        proc-vmstat.numa_local
    453634 ± 17%     +27.0%     576268 ±  5%  proc-vmstat.numa_pte_updates
     87464            +6.1%      92814        proc-vmstat.pgactivate
   1595438            +2.9%    1641074        proc-vmstat.pgalloc_normal
   1453326            +3.0%    1497530        proc-vmstat.pgfree
     17590 ±  5%     +14.0%      20045 ±  7%  proc-vmstat.pgreuse
    732160            -1.8%     719104        proc-vmstat.unevictable_pgs_scanned
     19.10            -8.1%      17.55        perf-stat.i.MPKI
 2.039e+09           +17.3%  2.393e+09        perf-stat.i.branch-instructions
      1.27 ±  2%      -0.1        1.15        perf-stat.i.branch-miss-rate%
  25600761            +5.8%   27075672        perf-stat.i.branch-misses
   5037721 ±  4%     +11.4%    5612619        perf-stat.i.cache-misses
 1.632e+08            +5.9%  1.729e+08        perf-stat.i.cache-references
     34079           +14.1%      38871        perf-stat.i.context-switches
 1.326e+10           +14.7%  1.521e+10        perf-stat.i.cpu-cycles
    551.02 ±  2%     +21.0%     666.59 ±  3%  perf-stat.i.cpu-migrations
   3953434 ±  2%     +10.8%    4381924 ±  3%  perf-stat.i.dTLB-load-misses
 2.343e+09           +15.4%  2.704e+09        perf-stat.i.dTLB-loads
 1.141e+09           +14.3%  1.303e+09        perf-stat.i.dTLB-stores
 9.047e+09           +14.9%  1.039e+10        perf-stat.i.instructions
      0.69            +2.0%       0.71        perf-stat.i.ipc
      0.14           +14.7%       0.16        perf-stat.i.metric.GHz
     34.94 ±  4%     +11.1%      38.80        perf-stat.i.metric.K/sec
     59.21           +15.6%      68.43        perf-stat.i.metric.M/sec
      3999 ±  3%      +6.3%       4250        perf-stat.i.minor-faults
   1116010 ±  4%     +14.8%    1280875 ±  2%  perf-stat.i.node-load-misses
   1168171 ±  3%      +7.9%    1259922 ±  2%  perf-stat.i.node-stores
      3999 ±  3%      +6.3%       4250        perf-stat.i.page-faults
     18.04            -7.8%      16.64        perf-stat.overall.MPKI
      1.26 ±  2%      -0.1        1.13        perf-stat.overall.branch-miss-rate%
 2.012e+09           +17.3%  2.359e+09        perf-stat.ps.branch-instructions
  25253051            +5.7%   26690222        perf-stat.ps.branch-misses
   4970910 ±  4%     +11.3%    5534021        perf-stat.ps.cache-misses
  1.61e+08            +5.9%  1.705e+08        perf-stat.ps.cache-references
     33628           +14.0%      38332        perf-stat.ps.context-switches
 1.308e+10           +14.6%    1.5e+10        perf-stat.ps.cpu-cycles
    543.73 ±  2%     +20.9%     657.37 ±  3%  perf-stat.ps.cpu-migrations
   3900887 ±  2%     +10.8%    4321011 ±  3%  perf-stat.ps.dTLB-load-misses
 2.312e+09           +15.3%  2.666e+09        perf-stat.ps.dTLB-loads
 1.125e+09           +14.2%  1.285e+09        perf-stat.ps.dTLB-stores
 8.925e+09           +14.8%  1.024e+10        perf-stat.ps.instructions
      3943 ±  3%      +6.2%       4187        perf-stat.ps.minor-faults
   1101275 ±  4%     +14.7%    1263151 ±  2%  perf-stat.ps.node-load-misses
   1152648 ±  3%      +7.7%    1241973 ±  2%  perf-stat.ps.node-stores
      3943 ±  3%      +6.2%       4187        perf-stat.ps.page-faults
 6.777e+11           +10.5%   7.49e+11        perf-stat.total.instructions
      0.01 ±  7%     -28.2%       0.00 ± 26%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.__btrfs_tree_read_lock
      0.30 ± 35%     -63.0%       0.11 ± 25%  perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.cifs_strndup_to_utf16.cifs_convert_path_to_utf16
     30.21 ±  3%      -6.2%      28.33 ±  3%  perf-sched.total_wait_and_delay.average.ms
     30.15 ±  3%      -6.2%      28.28 ±  3%  perf-sched.total_wait_time.average.ms
      1.08           -20.5%       0.86 ±  2%  perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
     99.86 ± 27%     +71.6%     171.38 ± 32%  perf-sched.wait_and_delay.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
      1.10 ±  2%     -16.3%       0.92        perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      1.41 ±  5%     -87.1%       0.18 ±223%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async
      0.21           -13.4%       0.18        perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
    195.95 ± 10%     -18.4%     159.83 ± 12%  perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.__SMB2_close
      2.60           -23.5%       1.99        perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.query_info
     20.46           -13.7%      17.66 ±  4%  perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info
      3.35 ± 66%    +342.5%      14.82 ± 20%  perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
      2103           +10.0%       2312 ±  3%  perf-sched.wait_and_delay.count.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
      1025           +14.8%       1176        perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
      9729 ±  2%     +21.1%      11779        perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      2349 ±  9%     +29.3%       3038 ± 10%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.compound_send_recv
    998.00           +14.3%       1140        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
      1026           +15.0%       1181        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
     18409           +12.5%      20714 ±  4%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      1011           +14.8%       1160        perf-sched.wait_and_delay.count.wait_for_response.compound_send_recv.cifs_send_recv.query_info
      1013           +14.5%       1160        perf-sched.wait_and_delay.count.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
      2.68 ±  4%     -19.6%       2.16 ±  7%  perf-sched.wait_and_delay.max.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
    282.00 ±  3%     -11.3%     250.07 ±  4%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
    280.97 ±  2%     -12.8%     244.97 ±  2%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
      0.49 ±125%     -97.2%       0.01 ±198%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
      1.05           -20.9%       0.83 ±  2%  perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
      2.14 ±  4%     +19.1%       2.55 ±  8%  perf-sched.wait_time.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
     99.82 ± 27%     +69.8%     169.46 ± 31%  perf-sched.wait_time.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
      1.08 ±  2%     -16.6%       0.90        perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      1.37 ±  5%     -24.5%       1.03 ±  5%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async
      0.20           -14.2%       0.17        perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
    195.53 ± 10%     -18.4%     159.54 ± 12%  perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.__SMB2_close
      2.54           -24.0%       1.93        perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.query_info
     20.44           -13.8%      17.63 ±  4%  perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info
      3.32 ± 67%    +345.6%      14.78 ± 20%  perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
    245.89 ±  9%     -11.8%     216.92 ±  6%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.cifs_strndup_to_utf16.cifs_convert_path_to_utf16
      3.14 ±  9%     -43.6%       1.77 ± 40%  perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
      2.65 ±  3%     -19.9%       2.12 ±  6%  perf-sched.wait_time.max.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
      0.57 ±101%     -91.5%       0.05 ±213%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
      1.79 ± 82%     -86.4%       0.24 ± 58%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
    281.92 ±  3%     -11.3%     249.99 ±  4%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
    280.90 ±  2%     -12.8%     244.88 ±  2%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  reply	other threads:[~2023-09-06  2:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-24  6:33 [PATCH 0/3] btrfs: make extent buffer memory continuous Qu Wenruo
2023-08-24  6:33 ` [PATCH 1/3] btrfs: warn on tree blocks which are not nodesize aligned Qu Wenruo
2023-09-06  9:34   ` Anand Jain
2023-09-06 16:53     ` David Sterba
2023-08-24  6:33 ` [PATCH 2/3] btrfs: map uncontinuous extent buffer pages into virtual address space Qu Wenruo
2023-08-28 10:36   ` Johannes Thumshirn
2023-08-24  6:33 ` [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory Qu Wenruo
2023-09-06  2:45   ` kernel test robot [this message]
2023-09-06 17:49 ` [PATCH 0/3] btrfs: make extent buffer memory continuous David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202309061050.19c12499-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=wqu@suse.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.