linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Qu Wenruo <wqu@suse.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-btrfs@vger.kernel.org>, <ying.huang@intel.com>,
	<feng.tang@intel.com>, <fengwei.yin@intel.com>,
	<oliver.sang@intel.com>
Subject: Re: [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory
Date: Wed, 6 Sep 2023 10:45:10 +0800	[thread overview]
Message-ID: <202309061050.19c12499-oliver.sang@intel.com> (raw)
In-Reply-To: <8bc15bfdaa2805d1d1b660b8b2e07a55aa02027d.1692858397.git.wqu@suse.com>



Hello,

kernel test robot noticed a 12.0% improvement of filebench.sum_operations/s on:


commit: 2fa4ac9754a7fa77bad88aae11ac77ba137d3858 ("[PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory")
url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-warn-on-tree-blocks-which-are-not-nodesize-aligned/20230824-143628
base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/all/8bc15bfdaa2805d1d1b660b8b2e07a55aa02027d.1692858397.git.wqu@suse.com/
patch subject: [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory

testcase: filebench
test machine: 96 threads 2 sockets (Ice Lake) with 128G memory
parameters:

	disk: 1HDD
	fs: btrfs
	fs2: cifs
	test: webproxy.f
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230906/202309061050.19c12499-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase:
  gcc-12/performance/1HDD/cifs/btrfs/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/webproxy.f/filebench

commit: 
  19e81514b8 ("btrfs: map uncontinuous extent buffer pages into virtual address space")
  2fa4ac9754 ("btrfs: utilize the physically/virtually continuous extent buffer memory")

19e81514b8c09202 2fa4ac9754a7fa77bad88aae11a 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     30592 ±194%     -92.3%       2343 ± 24%  sched_debug.cpu.avg_idle.min
      1.38            -5.9%       1.30        iostat.cpu.iowait
      4.63            +8.9%       5.04        iostat.cpu.system
      2.56            +0.5        3.09        mpstat.cpu.all.sys%
      0.54            +0.1        0.61        mpstat.cpu.all.usr%
      1996            +3.3%       2062        vmstat.io.bo
     33480           +13.5%      37993        vmstat.system.cs
    152.67           +12.6%     171.83        turbostat.Avg_MHz
      2562            +4.2%       2670        turbostat.Bzy_MHz
      5.34            +0.5        5.83        turbostat.C1E%
      7.12 ± 12%     -21.6%       5.58 ± 12%  turbostat.Pkg%pc2
    209.72            +1.5%     212.81        turbostat.PkgWatt
      4.92 ± 24%      +3.5        8.37 ± 32%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      5.13 ± 28%      +3.6        8.68 ± 31%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      5.13 ± 28%      +3.8        8.90 ± 30%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      5.13 ± 28%      +3.8        8.90 ± 30%  perf-profile.children.cycles-pp.cpuidle_enter
      5.13 ± 28%      +3.8        8.90 ± 30%  perf-profile.children.cycles-pp.cpuidle_enter_state
      5.34 ± 34%      +3.9        9.21 ± 28%  perf-profile.children.cycles-pp.cpuidle_idle_call
     13.90            +9.6%      15.23        filebench.sum_bytes_mb/s
    238030           +12.0%     266575        filebench.sum_operations
      3966           +12.0%       4442        filebench.sum_operations/s
      1043           +12.0%       1168        filebench.sum_reads/s
     25.14           -10.7%      22.46        filebench.sum_time_ms/op
    208.83           +11.9%     233.67        filebench.sum_writes/s
    506705            +5.8%     536097        filebench.time.file_system_outputs
      1597 ±  5%     -36.1%       1020 ±  3%  filebench.time.involuntary_context_switches
     61810 ±  2%      +6.0%      65519        filebench.time.minor_page_faults
    157.67 ±  2%     +31.5%     207.33        filebench.time.percent_of_cpu_this_job_got
    117.60 ±  2%     +27.1%     149.48        filebench.time.system_time
    375177           +10.3%     413862        filebench.time.voluntary_context_switches
     18717            +6.5%      19942        proc-vmstat.nr_active_anon
     20206            +1.2%      20445        proc-vmstat.nr_active_file
    298911            +2.2%     305406        proc-vmstat.nr_anon_pages
    132893            +5.6%     140397        proc-vmstat.nr_dirtied
    313040            +2.0%     319443        proc-vmstat.nr_inactive_anon
     32910            +3.4%      34035        proc-vmstat.nr_shmem
     62503            +1.4%      63367        proc-vmstat.nr_slab_unreclaimable
     99471            +3.7%     103159        proc-vmstat.nr_written
     18717            +6.5%      19942        proc-vmstat.nr_zone_active_anon
     20206            +1.2%      20445        proc-vmstat.nr_zone_active_file
    313040            +2.0%     319443        proc-vmstat.nr_zone_inactive_anon
    943632            +3.2%     974142        proc-vmstat.numa_hit
    841654            +3.6%     871757        proc-vmstat.numa_local
    453634 ± 17%     +27.0%     576268 ±  5%  proc-vmstat.numa_pte_updates
     87464            +6.1%      92814        proc-vmstat.pgactivate
   1595438            +2.9%    1641074        proc-vmstat.pgalloc_normal
   1453326            +3.0%    1497530        proc-vmstat.pgfree
     17590 ±  5%     +14.0%      20045 ±  7%  proc-vmstat.pgreuse
    732160            -1.8%     719104        proc-vmstat.unevictable_pgs_scanned
     19.10            -8.1%      17.55        perf-stat.i.MPKI
 2.039e+09           +17.3%  2.393e+09        perf-stat.i.branch-instructions
      1.27 ±  2%      -0.1        1.15        perf-stat.i.branch-miss-rate%
  25600761            +5.8%   27075672        perf-stat.i.branch-misses
   5037721 ±  4%     +11.4%    5612619        perf-stat.i.cache-misses
 1.632e+08            +5.9%  1.729e+08        perf-stat.i.cache-references
     34079           +14.1%      38871        perf-stat.i.context-switches
 1.326e+10           +14.7%  1.521e+10        perf-stat.i.cpu-cycles
    551.02 ±  2%     +21.0%     666.59 ±  3%  perf-stat.i.cpu-migrations
   3953434 ±  2%     +10.8%    4381924 ±  3%  perf-stat.i.dTLB-load-misses
 2.343e+09           +15.4%  2.704e+09        perf-stat.i.dTLB-loads
 1.141e+09           +14.3%  1.303e+09        perf-stat.i.dTLB-stores
 9.047e+09           +14.9%  1.039e+10        perf-stat.i.instructions
      0.69            +2.0%       0.71        perf-stat.i.ipc
      0.14           +14.7%       0.16        perf-stat.i.metric.GHz
     34.94 ±  4%     +11.1%      38.80        perf-stat.i.metric.K/sec
     59.21           +15.6%      68.43        perf-stat.i.metric.M/sec
      3999 ±  3%      +6.3%       4250        perf-stat.i.minor-faults
   1116010 ±  4%     +14.8%    1280875 ±  2%  perf-stat.i.node-load-misses
   1168171 ±  3%      +7.9%    1259922 ±  2%  perf-stat.i.node-stores
      3999 ±  3%      +6.3%       4250        perf-stat.i.page-faults
     18.04            -7.8%      16.64        perf-stat.overall.MPKI
      1.26 ±  2%      -0.1        1.13        perf-stat.overall.branch-miss-rate%
 2.012e+09           +17.3%  2.359e+09        perf-stat.ps.branch-instructions
  25253051            +5.7%   26690222        perf-stat.ps.branch-misses
   4970910 ±  4%     +11.3%    5534021        perf-stat.ps.cache-misses
  1.61e+08            +5.9%  1.705e+08        perf-stat.ps.cache-references
     33628           +14.0%      38332        perf-stat.ps.context-switches
 1.308e+10           +14.6%    1.5e+10        perf-stat.ps.cpu-cycles
    543.73 ±  2%     +20.9%     657.37 ±  3%  perf-stat.ps.cpu-migrations
   3900887 ±  2%     +10.8%    4321011 ±  3%  perf-stat.ps.dTLB-load-misses
 2.312e+09           +15.3%  2.666e+09        perf-stat.ps.dTLB-loads
 1.125e+09           +14.2%  1.285e+09        perf-stat.ps.dTLB-stores
 8.925e+09           +14.8%  1.024e+10        perf-stat.ps.instructions
      3943 ±  3%      +6.2%       4187        perf-stat.ps.minor-faults
   1101275 ±  4%     +14.7%    1263151 ±  2%  perf-stat.ps.node-load-misses
   1152648 ±  3%      +7.7%    1241973 ±  2%  perf-stat.ps.node-stores
      3943 ±  3%      +6.2%       4187        perf-stat.ps.page-faults
 6.777e+11           +10.5%   7.49e+11        perf-stat.total.instructions
      0.01 ±  7%     -28.2%       0.00 ± 26%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.__btrfs_tree_read_lock
      0.30 ± 35%     -63.0%       0.11 ± 25%  perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.cifs_strndup_to_utf16.cifs_convert_path_to_utf16
     30.21 ±  3%      -6.2%      28.33 ±  3%  perf-sched.total_wait_and_delay.average.ms
     30.15 ±  3%      -6.2%      28.28 ±  3%  perf-sched.total_wait_time.average.ms
      1.08           -20.5%       0.86 ±  2%  perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
     99.86 ± 27%     +71.6%     171.38 ± 32%  perf-sched.wait_and_delay.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
      1.10 ±  2%     -16.3%       0.92        perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      1.41 ±  5%     -87.1%       0.18 ±223%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async
      0.21           -13.4%       0.18        perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
    195.95 ± 10%     -18.4%     159.83 ± 12%  perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.__SMB2_close
      2.60           -23.5%       1.99        perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.query_info
     20.46           -13.7%      17.66 ±  4%  perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info
      3.35 ± 66%    +342.5%      14.82 ± 20%  perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
      2103           +10.0%       2312 ±  3%  perf-sched.wait_and_delay.count.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
      1025           +14.8%       1176        perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
      9729 ±  2%     +21.1%      11779        perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      2349 ±  9%     +29.3%       3038 ± 10%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.compound_send_recv
    998.00           +14.3%       1140        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
      1026           +15.0%       1181        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
     18409           +12.5%      20714 ±  4%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      1011           +14.8%       1160        perf-sched.wait_and_delay.count.wait_for_response.compound_send_recv.cifs_send_recv.query_info
      1013           +14.5%       1160        perf-sched.wait_and_delay.count.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
      2.68 ±  4%     -19.6%       2.16 ±  7%  perf-sched.wait_and_delay.max.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
    282.00 ±  3%     -11.3%     250.07 ±  4%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
    280.97 ±  2%     -12.8%     244.97 ±  2%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
      0.49 ±125%     -97.2%       0.01 ±198%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
      1.05           -20.9%       0.83 ±  2%  perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
      2.14 ±  4%     +19.1%       2.55 ±  8%  perf-sched.wait_time.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
     99.82 ± 27%     +69.8%     169.46 ± 31%  perf-sched.wait_time.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
      1.08 ±  2%     -16.6%       0.90        perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      1.37 ±  5%     -24.5%       1.03 ±  5%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async
      0.20           -14.2%       0.17        perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
    195.53 ± 10%     -18.4%     159.54 ± 12%  perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.__SMB2_close
      2.54           -24.0%       1.93        perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.query_info
     20.44           -13.8%      17.63 ±  4%  perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info
      3.32 ± 67%    +345.6%      14.78 ± 20%  perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
    245.89 ±  9%     -11.8%     216.92 ±  6%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.cifs_strndup_to_utf16.cifs_convert_path_to_utf16
      3.14 ±  9%     -43.6%       1.77 ± 40%  perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
      2.65 ±  3%     -19.9%       2.12 ±  6%  perf-sched.wait_time.max.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
      0.57 ±101%     -91.5%       0.05 ±213%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
      1.79 ± 82%     -86.4%       0.24 ± 58%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
    281.92 ±  3%     -11.3%     249.99 ±  4%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
    280.90 ±  2%     -12.8%     244.88 ±  2%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  reply	other threads:[~2023-09-06  2:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-24  6:33 [PATCH 0/3] btrfs: make extent buffer memory continuous Qu Wenruo
2023-08-24  6:33 ` [PATCH 1/3] btrfs: warn on tree blocks which are not nodesize aligned Qu Wenruo
2023-09-06  9:34   ` Anand Jain
2023-09-06 16:53     ` David Sterba
2023-08-24  6:33 ` [PATCH 2/3] btrfs: map uncontinuous extent buffer pages into virtual address space Qu Wenruo
2023-08-28 10:36   ` Johannes Thumshirn
2023-08-24  6:33 ` [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory Qu Wenruo
2023-09-06  2:45   ` kernel test robot [this message]
2023-09-06 17:49 ` [PATCH 0/3] btrfs: make extent buffer memory continuous David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202309061050.19c12499-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=wqu@suse.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).