From: kernel test robot <oliver.sang@intel.com>
To: Qu Wenruo <wqu@suse.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
<linux-btrfs@vger.kernel.org>, <ying.huang@intel.com>,
<feng.tang@intel.com>, <fengwei.yin@intel.com>,
<oliver.sang@intel.com>
Subject: Re: [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory
Date: Wed, 6 Sep 2023 10:45:10 +0800 [thread overview]
Message-ID: <202309061050.19c12499-oliver.sang@intel.com> (raw)
In-Reply-To: <8bc15bfdaa2805d1d1b660b8b2e07a55aa02027d.1692858397.git.wqu@suse.com>
Hello,
kernel test robot noticed a 12.0% improvement of filebench.sum_operations/s on:
commit: 2fa4ac9754a7fa77bad88aae11ac77ba137d3858 ("[PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory")
url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-warn-on-tree-blocks-which-are-not-nodesize-aligned/20230824-143628
base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/all/8bc15bfdaa2805d1d1b660b8b2e07a55aa02027d.1692858397.git.wqu@suse.com/
patch subject: [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory
testcase: filebench
test machine: 96 threads 2 sockets (Ice Lake) with 128G memory
parameters:
disk: 1HDD
fs: btrfs
fs2: cifs
test: webproxy.f
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230906/202309061050.19c12499-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/1HDD/cifs/btrfs/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/webproxy.f/filebench
commit:
19e81514b8 ("btrfs: map uncontinuous extent buffer pages into virtual address space")
2fa4ac9754 ("btrfs: utilize the physically/virtually continuous extent buffer memory")
19e81514b8c09202 2fa4ac9754a7fa77bad88aae11a
---------------- ---------------------------
%stddev %change %stddev
\ | \
30592 ±194% -92.3% 2343 ± 24% sched_debug.cpu.avg_idle.min
1.38 -5.9% 1.30 iostat.cpu.iowait
4.63 +8.9% 5.04 iostat.cpu.system
2.56 +0.5 3.09 mpstat.cpu.all.sys%
0.54 +0.1 0.61 mpstat.cpu.all.usr%
1996 +3.3% 2062 vmstat.io.bo
33480 +13.5% 37993 vmstat.system.cs
152.67 +12.6% 171.83 turbostat.Avg_MHz
2562 +4.2% 2670 turbostat.Bzy_MHz
5.34 +0.5 5.83 turbostat.C1E%
7.12 ± 12% -21.6% 5.58 ± 12% turbostat.Pkg%pc2
209.72 +1.5% 212.81 turbostat.PkgWatt
4.92 ± 24% +3.5 8.37 ± 32% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
5.13 ± 28% +3.6 8.68 ± 31% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
5.13 ± 28% +3.8 8.90 ± 30% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
5.13 ± 28% +3.8 8.90 ± 30% perf-profile.children.cycles-pp.cpuidle_enter
5.13 ± 28% +3.8 8.90 ± 30% perf-profile.children.cycles-pp.cpuidle_enter_state
5.34 ± 34% +3.9 9.21 ± 28% perf-profile.children.cycles-pp.cpuidle_idle_call
13.90 +9.6% 15.23 filebench.sum_bytes_mb/s
238030 +12.0% 266575 filebench.sum_operations
3966 +12.0% 4442 filebench.sum_operations/s
1043 +12.0% 1168 filebench.sum_reads/s
25.14 -10.7% 22.46 filebench.sum_time_ms/op
208.83 +11.9% 233.67 filebench.sum_writes/s
506705 +5.8% 536097 filebench.time.file_system_outputs
1597 ± 5% -36.1% 1020 ± 3% filebench.time.involuntary_context_switches
61810 ± 2% +6.0% 65519 filebench.time.minor_page_faults
157.67 ± 2% +31.5% 207.33 filebench.time.percent_of_cpu_this_job_got
117.60 ± 2% +27.1% 149.48 filebench.time.system_time
375177 +10.3% 413862 filebench.time.voluntary_context_switches
18717 +6.5% 19942 proc-vmstat.nr_active_anon
20206 +1.2% 20445 proc-vmstat.nr_active_file
298911 +2.2% 305406 proc-vmstat.nr_anon_pages
132893 +5.6% 140397 proc-vmstat.nr_dirtied
313040 +2.0% 319443 proc-vmstat.nr_inactive_anon
32910 +3.4% 34035 proc-vmstat.nr_shmem
62503 +1.4% 63367 proc-vmstat.nr_slab_unreclaimable
99471 +3.7% 103159 proc-vmstat.nr_written
18717 +6.5% 19942 proc-vmstat.nr_zone_active_anon
20206 +1.2% 20445 proc-vmstat.nr_zone_active_file
313040 +2.0% 319443 proc-vmstat.nr_zone_inactive_anon
943632 +3.2% 974142 proc-vmstat.numa_hit
841654 +3.6% 871757 proc-vmstat.numa_local
453634 ± 17% +27.0% 576268 ± 5% proc-vmstat.numa_pte_updates
87464 +6.1% 92814 proc-vmstat.pgactivate
1595438 +2.9% 1641074 proc-vmstat.pgalloc_normal
1453326 +3.0% 1497530 proc-vmstat.pgfree
17590 ± 5% +14.0% 20045 ± 7% proc-vmstat.pgreuse
732160 -1.8% 719104 proc-vmstat.unevictable_pgs_scanned
19.10 -8.1% 17.55 perf-stat.i.MPKI
2.039e+09 +17.3% 2.393e+09 perf-stat.i.branch-instructions
1.27 ± 2% -0.1 1.15 perf-stat.i.branch-miss-rate%
25600761 +5.8% 27075672 perf-stat.i.branch-misses
5037721 ± 4% +11.4% 5612619 perf-stat.i.cache-misses
1.632e+08 +5.9% 1.729e+08 perf-stat.i.cache-references
34079 +14.1% 38871 perf-stat.i.context-switches
1.326e+10 +14.7% 1.521e+10 perf-stat.i.cpu-cycles
551.02 ± 2% +21.0% 666.59 ± 3% perf-stat.i.cpu-migrations
3953434 ± 2% +10.8% 4381924 ± 3% perf-stat.i.dTLB-load-misses
2.343e+09 +15.4% 2.704e+09 perf-stat.i.dTLB-loads
1.141e+09 +14.3% 1.303e+09 perf-stat.i.dTLB-stores
9.047e+09 +14.9% 1.039e+10 perf-stat.i.instructions
0.69 +2.0% 0.71 perf-stat.i.ipc
0.14 +14.7% 0.16 perf-stat.i.metric.GHz
34.94 ± 4% +11.1% 38.80 perf-stat.i.metric.K/sec
59.21 +15.6% 68.43 perf-stat.i.metric.M/sec
3999 ± 3% +6.3% 4250 perf-stat.i.minor-faults
1116010 ± 4% +14.8% 1280875 ± 2% perf-stat.i.node-load-misses
1168171 ± 3% +7.9% 1259922 ± 2% perf-stat.i.node-stores
3999 ± 3% +6.3% 4250 perf-stat.i.page-faults
18.04 -7.8% 16.64 perf-stat.overall.MPKI
1.26 ± 2% -0.1 1.13 perf-stat.overall.branch-miss-rate%
2.012e+09 +17.3% 2.359e+09 perf-stat.ps.branch-instructions
25253051 +5.7% 26690222 perf-stat.ps.branch-misses
4970910 ± 4% +11.3% 5534021 perf-stat.ps.cache-misses
1.61e+08 +5.9% 1.705e+08 perf-stat.ps.cache-references
33628 +14.0% 38332 perf-stat.ps.context-switches
1.308e+10 +14.6% 1.5e+10 perf-stat.ps.cpu-cycles
543.73 ± 2% +20.9% 657.37 ± 3% perf-stat.ps.cpu-migrations
3900887 ± 2% +10.8% 4321011 ± 3% perf-stat.ps.dTLB-load-misses
2.312e+09 +15.3% 2.666e+09 perf-stat.ps.dTLB-loads
1.125e+09 +14.2% 1.285e+09 perf-stat.ps.dTLB-stores
8.925e+09 +14.8% 1.024e+10 perf-stat.ps.instructions
3943 ± 3% +6.2% 4187 perf-stat.ps.minor-faults
1101275 ± 4% +14.7% 1263151 ± 2% perf-stat.ps.node-load-misses
1152648 ± 3% +7.7% 1241973 ± 2% perf-stat.ps.node-stores
3943 ± 3% +6.2% 4187 perf-stat.ps.page-faults
6.777e+11 +10.5% 7.49e+11 perf-stat.total.instructions
0.01 ± 7% -28.2% 0.00 ± 26% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.__btrfs_tree_read_lock
0.30 ± 35% -63.0% 0.11 ± 25% perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.cifs_strndup_to_utf16.cifs_convert_path_to_utf16
30.21 ± 3% -6.2% 28.33 ± 3% perf-sched.total_wait_and_delay.average.ms
30.15 ± 3% -6.2% 28.28 ± 3% perf-sched.total_wait_time.average.ms
1.08 -20.5% 0.86 ± 2% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
99.86 ± 27% +71.6% 171.38 ± 32% perf-sched.wait_and_delay.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
1.10 ± 2% -16.3% 0.92 perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1.41 ± 5% -87.1% 0.18 ±223% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async
0.21 -13.4% 0.18 perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
195.95 ± 10% -18.4% 159.83 ± 12% perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.__SMB2_close
2.60 -23.5% 1.99 perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.query_info
20.46 -13.7% 17.66 ± 4% perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info
3.35 ± 66% +342.5% 14.82 ± 20% perf-sched.wait_and_delay.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
2103 +10.0% 2312 ± 3% perf-sched.wait_and_delay.count.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
1025 +14.8% 1176 perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
9729 ± 2% +21.1% 11779 perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
2349 ± 9% +29.3% 3038 ± 10% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.compound_send_recv
998.00 +14.3% 1140 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
1026 +15.0% 1181 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
18409 +12.5% 20714 ± 4% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
1011 +14.8% 1160 perf-sched.wait_and_delay.count.wait_for_response.compound_send_recv.cifs_send_recv.query_info
1013 +14.5% 1160 perf-sched.wait_and_delay.count.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
2.68 ± 4% -19.6% 2.16 ± 7% perf-sched.wait_and_delay.max.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
282.00 ± 3% -11.3% 250.07 ± 4% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
280.97 ± 2% -12.8% 244.97 ± 2% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
0.49 ±125% -97.2% 0.01 ±198% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
1.05 -20.9% 0.83 ± 2% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
2.14 ± 4% +19.1% 2.55 ± 8% perf-sched.wait_time.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
99.82 ± 27% +69.8% 169.46 ± 31% perf-sched.wait_time.avg.ms.kthreadd.ret_from_fork.ret_from_fork_asm
1.08 ± 2% -16.6% 0.90 perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1.37 ± 5% -24.5% 1.03 ± 5% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async
0.20 -14.2% 0.17 perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
195.53 ± 10% -18.4% 159.54 ± 12% perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.__SMB2_close
2.54 -24.0% 1.93 perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.cifs_send_recv.query_info
20.44 -13.8% 17.63 ± 4% perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info
3.32 ± 67% +345.6% 14.78 ± 20% perf-sched.wait_time.avg.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_unlink
245.89 ± 9% -11.8% 216.92 ± 6% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.cifs_strndup_to_utf16.cifs_convert_path_to_utf16
3.14 ± 9% -43.6% 1.77 ± 40% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
2.65 ± 3% -19.9% 2.12 ± 6% perf-sched.wait_time.max.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg
0.57 ±101% -91.5% 0.05 ±213% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
1.79 ± 82% -86.4% 0.24 ± 58% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
281.92 ± 3% -11.3% 249.99 ± 4% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
280.90 ± 2% -12.8% 244.88 ± 2% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
next prev parent reply other threads:[~2023-09-06 2:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-24 6:33 [PATCH 0/3] btrfs: make extent buffer memory continuous Qu Wenruo
2023-08-24 6:33 ` [PATCH 1/3] btrfs: warn on tree blocks which are not nodesize aligned Qu Wenruo
2023-09-06 9:34 ` Anand Jain
2023-09-06 16:53 ` David Sterba
2023-08-24 6:33 ` [PATCH 2/3] btrfs: map uncontinuous extent buffer pages into virtual address space Qu Wenruo
2023-08-28 10:36 ` Johannes Thumshirn
2023-08-24 6:33 ` [PATCH 3/3] btrfs: utilize the physically/virtually continuous extent buffer memory Qu Wenruo
2023-09-06 2:45 ` kernel test robot [this message]
2023-09-06 17:49 ` [PATCH 0/3] btrfs: make extent buffer memory continuous David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202309061050.19c12499-oliver.sang@intel.com \
--to=oliver.sang@intel.com \
--cc=feng.tang@intel.com \
--cc=fengwei.yin@intel.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=wqu@suse.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.