From: kernel test robot <oliver.sang@intel.com>
To: Qu Wenruo <wqu@suse.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
<linux-btrfs@vger.kernel.org>, <ying.huang@intel.com>,
<feng.tang@intel.com>, <fengwei.yin@intel.com>,
<oliver.sang@intel.com>
Subject: Re: [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory
Date: Wed, 6 Sep 2023 10:45:10 +0800 [thread overview]
Message-ID: <202309061050.19c12499-oliver.sang@intel.com> (raw)
In-Reply-To: <8bc15bfdaa2805d1d1b660b8b2e07a55aa02027d.1692858397.git.wqu@suse.com>
Hello,
kernel test robot noticed a 12.0% improvement of filebench.sum_operations/s on:
commit: 2fa4ac9754a7fa77bad88aae11ac77ba137d3858 ("[PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory")
url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-warn-on-tree-blocks-which-are-not-nodesize-aligned/20230824-143628
base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/all/8bc15bfdaa2805d1d1b660b8b2e07a55aa02027d.1692858397.git.wqu@suse.com/
patch subject: [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory
testcase: filebench
test machine: 96 threads 2 sockets (Ice Lake) with 128G memory
parameters:
disk: 1HDD
fs: btrfs
fs2: cifs
test: webproxy.f
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230906/202309061050.19c12499-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/1HDD/cifs/btrfs/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/webproxy.f/filebench
commit:
19e81514b8 ("btrfs: map uncontinuous extent buffer pages into virtual address space")
2fa4ac9754 ("btrfs: utilize the physically/virtually continuous extent buffer memory")
19e81514b8c09202 2fa4ac9754a7fa77bad88aae11a
---------------- ---------------------------
%stddev %change %stddev
\ | \
30592 ±194% -92.3% 2343 ± 24% sched_debug.cpu.avg_idle.min
1.38 -5.9% 1.30 iostat.cpu.iowait
4.63 +8.9% 5.04 iostat.cpu.system
2.56 +0.5 3.09 mpstat.cpu.all.sys%
0.54 +0.1 0.61 mpstat.cpu.all.usr%
1996 +3.3% 2062 vmstat.io.bo
33480 +13.5% 37993 vmstat.system.cs
152.67 +12.6% 171.83 turbostat.Avg_MHz
2562 +4.2% 2670 turbostat.Bzy_MHz
5.34 +0.5 5.83 turbostat.C1E%
7.12 ± 12% -21.6% 5.58 ± 12% turbostat.Pkg%pc2
209.72 +1.5% 212.81 turbostat.PkgWatt
4.92 ± 24% +3.5 8.37 ± 32% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
5.13 ± 28% +3.6 8.68 ± 31% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
5.13 ± 28% +3.8 8.90 ± 30% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
5.13 ± 28% +3.8 8.90 ± 30% perf-profile.children.cycles-pp.cpuidle_enter
5.13 ± 28% +3.8 8.90 ± 30% perf-profile.children.cycles-pp.cpuidle_enter_state
5.34 ± 34% +3.9 9.21 ± 28% perf-profile.children.cycles-pp.cpuidle_idle_call
13.90 +9.6% 15.23 filebench.sum_bytes_mb/s
238030 +12.0% 266575 filebench.sum_operations
3966 +12.0% 4442 filebench.sum_operations/s
1043 +12.0% 1168 filebench.sum_reads/s
25.14 -10.7% 22.46 filebench.sum_time_ms/op
208.83 +11.9% 233.67 filebench.sum_writes/s
506705 +5.8% 536097 filebench.time.file_system_outputs
1597 ± 5% -36.1% 1020 ± 3% filebench.time.involuntary_context_switches
61810 ± 2% +6.0% 65519 filebench.time.minor_page_faults
157.67 ± 2% +31.5% 207.33 filebench.time.percent_of_cpu_this_job_got
117.60 ± 2% +27.1% 149.48 filebench.time.system_time
375177 +10.3% 413862 filebench.time.voluntary_context_switches
18717 +6.5% 19942 proc-vmstat.nr_active_anon
20206 +1.2% 20445 proc-vmstat.nr_active_file
298911 +2.2% 305406 proc-vmstat.nr_anon_pages
132893 +5.6% 140397 proc-vmstat.nr_dirtied
313040 +2.0% 319443 proc-vmstat.nr_inactive_anon
32910 +3.4% 34035 proc-vmstat.nr_shmem
62503 +1.4% 63367 proc-vmstat.nr_slab_unreclaimable
99471 +3.7% 103159 proc-vmstat.nr_written
18717 +6.5% 19942 proc-vmstat.nr_zone_active_anon
20206 +1.2% 20445 proc-vmstat.nr_zone_active_file
313040 +2.0% 319443 proc-vmstat.nr_zone_inactive_anon
943632 +3.2% 974142 proc-vmstat.numa_hit
841654 +3.6% 871757 proc-vmstat.numa_local
453634 ± 17% +27.0% 576268 ± 5% proc-vmstat.numa_pte_updates
87464 +6.1% 92814 proc-vmstat.pgactivate
1595438 +2.9% 1641074 proc-vmstat.pgalloc_normal
1453326 +3.0% 1497530 proc-vmstat.pgfree
17590 ± 5% +14.0% 20045 ± 7% proc-vmstat.pgreuse
732160 -1.8% 719104 proc-vmstat.unevictable_pgs_scanned
19.10 -8.1% 17.55 perf-stat.i.MPKI
2.039e+09 +17.3% 2.393e+09 perf-stat.i.branch-instructions
1.27 ± 2% -0.1 1.15 perf-stat.i.branch-miss-rate%
25600761 +5.8% 27075672 perf-stat.i.branch-misses
5037721 ± 4% +11.4% 5612619 perf-stat.i.cache-misses
1.632e+08 +5.9% 1.729e+08 perf-stat.i.cache-references
34079 +14.1% 38871 perf-stat.i.context-switches
1.326e+10 +14.7% 1.521e+10 perf-stat.i.cpu-cycles
551.02 ± 2% +21.0% 666.59 ± 3% perf-stat.i.cpu-migrations
3953434 ± 2% +10.8% 4381924 ± 3% perf-stat.i.dTLB-load-misses
2.343e+09 +15.4% 2.704e+09 perf-stat.i.dTLB-loads
1.141e+09 +14.3% 1.303e+09 perf-stat.i.dTLB-stores
9.047e+09 +14.9% 1.039e+10 perf-stat.i.instructions
0.69 +2.0% 0.71 perf-stat.i.ipc
0.14 +14.7% 0.16 perf-stat.i.metric.GHz
34.94 ± 4% +11.1% 38.80 perf-stat.i.metric.K/sec
59.21 +15.6% 68.43 perf-stat.i.metric.M/sec
3999 ± 3% +6.3% 4250 perf-stat.i.minor-faults
1116010 ± 4% +14.8% 1280875 ± 2% perf-stat.i.node-load-misses
1168171 ± 3% +7.9% 1259922 ± 2% perf-stat.i.node-stores
3999 ± 3% +6.3% 4250 perf-stat.i.page-faults
18.04 -7.8% 16.64 perf-stat.overall.MPKI
1.26 ± 2% -0.1 1.13 perf-stat.overall.branch-miss-rate%
2.012e+09 +17.3% 2.359e+09 perf-stat.ps.branch-instructions
25253051 +5.7% 26690222 perf-stat.ps.branch-misses
4970910 ± 4% +11.3% 5534021 perf-stat.ps.cache-misses
1.61e+08 +5.9% 1.705e+08 perf-stat.ps.cache-references
33628 +14.0% 38332 perf-stat.ps.context-switches
1.308e+10 +14.6% 1.5e+10 perf-stat.ps.cpu-cycles
543.73 ± 2% +20.9% 657.37 ± 3% perf-stat.ps.cpu-migrations
3900887 ± 2% +10.8% 4321011 ± 3% perf-stat.ps.dTLB-load-misses
2.312e+09 +15.3% 2.666e+09 perf-stat.ps.dTLB-loads
1.125e+09 +14.2% 1.285e+09 perf-stat.ps.dTLB-stores
8.925e+09 +14.8% 1.024e+10 perf-stat.ps.instructions
3943 ± 3% +6.2% 4187 perf-stat.ps.minor-faults
1101275 ± 4% +14.7% 1263151 ± 2% perf-stat.ps.node-load-misses
1152648 ± 3% +7.7% 1241973 ± 2% perf-stat.ps.node-stores
3943 ± 3% +6.2% 4187 perf-stat.ps.page-faults
6.777e+11 +10.5% 7.49e+11 perf-stat.total.instructions
0.01 ± 7% -28.2% 0.00 ± 26% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.__btrfs_tree_read_lock
0.30 ± 35% -63.0% 0.11 ± 25% perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.cifs_strndup_to_utf16.cifs_convert_path_to_utf16
30.21 ± 3% -6.2% 28.33 ± 3% perf-sched.total_wait_and_delay.average.ms
30.15 ± 3% -6.2% 28.28 ± 3% perf-sched.total_wait_time.average.ms
1.08 -20.5% 0.86 ± 2% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
99.86 ± 27% +71.6% 171.38 ± 32% perf-sched.wait_and_delay.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
1.10 ± 2% -16.3% 0.92 perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1.41 ± 5% -87.1% 0.18 ±223% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async
0.21 -13.4% 0.18 perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
195.95 ± 10% -18.4% 159.83 ± 12% perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.__SMB2_close
2.60 -23.5% 1.99 perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.query_info
20.46 -13.7% 17.66 ± 4% perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info
3.35 ± 66% +342.5% 14.82 ± 20% perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
2103 +10.0% 2312 ± 3% perf-sched.wait_and_delay.count.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
1025 +14.8% 1176 perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
9729 ± 2% +21.1% 11779 perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
2349 ± 9% +29.3% 3038 ± 10% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.compound_send_recv
998.00 +14.3% 1140 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
1026 +15.0% 1181 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
18409 +12.5% 20714 ± 4% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
1011 +14.8% 1160 perf-sched.wait_and_delay.count.wait_for_response.compound_send_recv.cifs_send_recv.query_info
1013 +14.5% 1160 perf-sched.wait_and_delay.count.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
2.68 ± 4% -19.6% 2.16 ± 7% perf-sched.wait_and_delay.max.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
282.00 ± 3% -11.3% 250.07 ± 4% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
280.97 ± 2% -12.8% 244.97 ± 2% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
0.49 ±125% -97.2% 0.01 ±198% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
1.05 -20.9% 0.83 ± 2% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
2.14 ± 4% +19.1% 2.55 ± 8% perf-sched.wait_time.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
99.82 ± 27% +69.8% 169.46 ± 31% perf-sched.wait_time.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
1.08 ± 2% -16.6% 0.90 perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1.37 ± 5% -24.5% 1.03 ± 5% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async
0.20 -14.2% 0.17 perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
195.53 ± 10% -18.4% 159.54 ± 12% perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.__SMB2_close
2.54 -24.0% 1.93 perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.query_info
20.44 -13.8% 17.63 ± 4% perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info
3.32 ± 67% +345.6% 14.78 ± 20% perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
245.89 ± 9% -11.8% 216.92 ± 6% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.cifs_strndup_to_utf16.cifs_convert_path_to_utf16
3.14 ± 9% -43.6% 1.77 ± 40% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
2.65 ± 3% -19.9% 2.12 ± 6% perf-sched.wait_time.max.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
0.57 ±101% -91.5% 0.05 ±213% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
1.79 ± 82% -86.4% 0.24 ± 58% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
281.92 ± 3% -11.3% 249.99 ± 4% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
280.90 ± 2% -12.8% 244.88 ± 2% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
next prev parent reply other threads:[~2023-09-06 2:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-24 6:33 [PATCH 0/3] btrfs: make extent buffer memory continuous Qu Wenruo
2023-08-24 6:33 ` [PATCH 1/3] btrfs: warn on tree blocks which are not nodesize aligned Qu Wenruo
2023-09-06 9:34 ` Anand Jain
2023-09-06 16:53 ` David Sterba
2023-08-24 6:33 ` [PATCH 2/3] btrfs: map uncontinuous extent buffer pages into virtual address space Qu Wenruo
2023-08-28 10:36 ` Johannes Thumshirn
2023-08-24 6:33 ` [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory Qu Wenruo
2023-09-06 2:45 ` kernel test robot [this message]
2023-09-06 17:49 ` [PATCH 0/3] btrfs: make extent buffer memory continuous David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202309061050.19c12499-oliver.sang@intel.com \
--to=oliver.sang@intel.com \
--cc=feng.tang@intel.com \
--cc=fengwei.yin@intel.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=wqu@suse.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).