* [linux-next:master] [btrfs] 551e510a97: fio.write_iops 12.4% improvement
@ 2026-05-29 9:04 kernel test robot
2026-05-29 10:39 ` Mark Harmstone
0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2026-05-29 9:04 UTC (permalink / raw)
To: Mark Harmstone; +Cc: oe-lkp, lkp, David Sterba, linux-btrfs, oliver.sang
Hello,
kernel test robot noticed a 12.4% improvement of fio.write_iops on:
commit: 551e510a97a487218d5f22d61d1a3388ef1171ac ("btrfs: don't force DIO writes to be serialized")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: fio-basic
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
runtime: 300s
disk: 1HDD
fs: btrfs
nr_task: 1
test_size: 128G
rw: randwrite
bs: 4k
ioengine: falloc
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------+
| testcase: change | fio-basic: fio.write_iops 10.9% improvement |
| test parameters | bs=4k |
| | cpufreq_governor=performance |
| | disk=1HDD |
| | fs=btrfs |
| | ioengine=falloc |
| | nr_task=1 |
| | runtime=300s |
| | rw=write |
| | test_size=128G |
+------------------+---------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260529/202605291631.659bf248-lkp@intel.com
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
4k/gcc-14/performance/1HDD/btrfs/falloc/x86_64-rhel-9.4/1/debian-13-x86_64-20250902.cgz/300s/randwrite/lkp-icl-2sp9/128G/fio-basic
commit:
964f569c14 ("btrfs: limit size of bios submitted from writeback")
551e510a97 ("btrfs: don't force DIO writes to be serialized")
964f569c14d7778c 551e510a97a487218d5f22d61d1
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.04 ± 5% -0.0 0.04 ± 5% fio.latency_4us%
5315 +12.4% 5976 fio.write_bw_MBps
510.67 -14.2% 438.00 fio.write_clat_90%_ns
515.00 -14.2% 442.00 fio.write_clat_95%_ns
522.67 -13.8% 450.67 fio.write_clat_99%_ns
496.01 -14.6% 423.39 fio.write_clat_mean_ns
1360660 +12.4% 1530010 fio.write_iops
1.67e+09 -10.0% 1.503e+09 cpuidle..time
1.48 +2.5% 1.51 iostat.cpu.system
42426 -6.0% 39867 ± 2% meminfo.AnonHugePages
165506 ± 3% -6.8% 154330 turbostat.IRQ
2392 +4.8% 2506 vmstat.system.cs
1945 +4.5% 2034 perf-stat.i.context-switches
1874 +4.1% 1949 perf-stat.ps.context-switches
1489031 ±110% +162.9% 3915040 ± 3% numa-meminfo.node0.FilePages
40746 ± 46% +72.2% 70177 ± 5% numa-meminfo.node0.KReclaimable
29851 ±122% +178.8% 83225 numa-meminfo.node0.Mapped
40746 ± 46% +72.2% 70177 ± 5% numa-meminfo.node0.SReclaimable
126444 ± 22% +29.1% 163254 ± 8% numa-meminfo.node0.Slab
1485610 ±110% +163.2% 3910117 ± 3% numa-meminfo.node0.Unevictable
2626454 ± 62% -92.4% 200365 ± 69% numa-meminfo.node1.FilePages
56220 ± 65% -95.9% 2321 ± 67% numa-meminfo.node1.Mapped
2622239 ± 62% -92.5% 197733 ± 70% numa-meminfo.node1.Unevictable
372258 ±110% +162.9% 978760 ± 3% numa-vmstat.node0.nr_file_pages
7462 ±122% +178.8% 20806 numa-vmstat.node0.nr_mapped
10186 ± 46% +72.2% 17544 ± 5% numa-vmstat.node0.nr_slab_reclaimable
371402 ±110% +163.2% 977529 ± 3% numa-vmstat.node0.nr_unevictable
371402 ±110% +163.2% 977529 ± 3% numa-vmstat.node0.nr_zone_unevictable
656614 ± 62% -92.4% 50091 ± 69% numa-vmstat.node1.nr_file_pages
14054 ± 65% -95.9% 580.52 ± 67% numa-vmstat.node1.nr_mapped
655559 ± 62% -92.5% 49433 ± 70% numa-vmstat.node1.nr_unevictable
655559 ± 62% -92.5% 49433 ± 70% numa-vmstat.node1.nr_zone_unevictable
80.52 ± 25% -58.3 22.22 ±147% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.get_signal
80.52 ± 25% -58.3 22.22 ±147% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
80.52 ± 25% -56.6 23.89 ±144% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
68.85 ± 49% -50.0 18.89 ±163% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
68.85 ± 49% -50.0 18.89 ±163% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
68.57 ± 49% -49.7 18.89 ±163% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
68.85 ± 49% -48.3 20.56 ±153% perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.do_syscall_64
70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
71.35 ± 48% -45.8 25.56 ±142% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
71.35 ± 48% -45.8 25.56 ±142% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
35.48 ± 64% -24.4 11.11 ±223% perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
22.82 ± 62% -18.4 4.44 ±147% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
23.10 ± 64% -15.9 7.22 ±169% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
23.10 ± 64% -15.9 7.22 ±169% perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
12.38 ±118% -12.4 0.00 perf-profile.calltrace.cycles-pp.lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range
15.99 ±131% -10.4 5.56 ±223% perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
12.38 ±143% -9.6 2.78 ±223% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
12.38 ±143% -9.6 2.78 ±223% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
12.38 ±143% -9.6 2.78 ±223% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
11.71 ± 98% -7.3 4.44 ±147% perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
83.17 ± 23% -59.3 23.89 ±144% perf-profile.children.cycles-pp.__mmput
83.17 ± 23% -59.3 23.89 ±144% perf-profile.children.cycles-pp.exit_mmap
80.79 ± 24% -58.6 22.22 ±147% perf-profile.children.cycles-pp.arch_do_signal_or_restart
80.79 ± 24% -58.6 22.22 ±147% perf-profile.children.cycles-pp.get_signal
80.79 ± 24% -56.9 23.89 ±144% perf-profile.children.cycles-pp.do_exit
80.79 ± 24% -56.9 23.89 ±144% perf-profile.children.cycles-pp.do_group_exit
80.52 ± 25% -56.6 23.89 ±144% perf-profile.children.cycles-pp.exit_mm
75.95 ± 40% -50.4 25.56 ±142% perf-profile.children.cycles-pp.do_syscall_64
75.95 ± 40% -50.4 25.56 ±142% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
68.85 ± 49% -50.0 18.89 ±163% perf-profile.children.cycles-pp.unmap_page_range
68.85 ± 49% -50.0 18.89 ±163% perf-profile.children.cycles-pp.zap_pmd_range
68.85 ± 49% -50.0 18.89 ±163% perf-profile.children.cycles-pp.zap_pte_range
68.85 ± 49% -48.3 20.56 ±153% perf-profile.children.cycles-pp.unmap_vmas
70.24 ± 51% -48.0 22.22 ±147% perf-profile.children.cycles-pp.exit_to_user_mode_loop
35.20 ± 66% -26.9 8.33 ±223% perf-profile.children.cycles-pp.zap_present_ptes
23.10 ± 64% -15.9 7.22 ±169% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
23.10 ± 64% -15.9 7.22 ±169% perf-profile.children.cycles-pp.free_pages_and_swap_cache
23.10 ± 64% -15.9 7.22 ±169% perf-profile.children.cycles-pp.tlb_flush_mmu
19.33 ±102% -13.8 5.56 ±223% perf-profile.children.cycles-pp.folio_remove_rmap_ptes
12.38 ±143% -9.6 2.78 ±223% perf-profile.children.cycles-pp.kthread
12.38 ±143% -9.6 2.78 ±223% perf-profile.children.cycles-pp.ret_from_fork
12.38 ±143% -9.6 2.78 ±223% perf-profile.children.cycles-pp.ret_from_fork_asm
9.05 ±102% -9.0 0.00 perf-profile.children.cycles-pp.lruvec_stat_mod_folio
11.71 ± 98% -7.3 4.44 ±147% perf-profile.children.cycles-pp.folios_put_refs
15.60 ± 97% -15.6 0.00 perf-profile.self.cycles-pp.zap_present_ptes
13.61 ±107% -8.1 5.56 ±223% perf-profile.self.cycles-pp.folio_remove_rmap_ptes
10.28 ± 94% -6.9 3.33 ±223% perf-profile.self.cycles-pp.zap_pte_range
***************************************************************************************************
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
4k/gcc-14/performance/1HDD/btrfs/falloc/x86_64-rhel-9.4/1/debian-13-x86_64-20250902.cgz/300s/write/lkp-icl-2sp9/128G/fio-basic
commit:
964f569c14 ("btrfs: limit size of bios submitted from writeback")
551e510a97 ("btrfs: don't force DIO writes to be serialized")
964f569c14d7778c 551e510a97a487218d5f22d61d1
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.04 ± 4% -0.0 0.04 ± 4% fio.latency_4us%
6156 +10.9% 6825 fio.write_bw_MBps
459.33 -13.5% 397.33 fio.write_clat_90%_ns
462.00 -13.7% 398.67 ± 2% fio.write_clat_95%_ns
466.67 -13.6% 403.33 ± 2% fio.write_clat_99%_ns
454.12 -14.1% 389.87 fio.write_clat_mean_ns
1575955 +10.9% 1747309 fio.write_iops
556943 ± 91% -62.8% 207326 ±184% sched_debug.cfs_rq:/.load.max
152355 ± 2% -7.7% 140620 ± 3% turbostat.IRQ
1.472e+09 ± 2% -9.2% 1.337e+09 cpuidle..time
99291 ± 3% -7.8% 91524 ± 4% cpuidle..usage
9.05 ±102% -9.0 0.00 perf-profile.calltrace.cycles-pp.free_pgtables.exit_mmap.__mmput.exit_mm.do_exit
9.05 ±102% -7.4 1.67 ±223% perf-profile.children.cycles-pp.free_pgtables
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linux-next:master] [btrfs] 551e510a97: fio.write_iops 12.4% improvement
2026-05-29 9:04 [linux-next:master] [btrfs] 551e510a97: fio.write_iops 12.4% improvement kernel test robot
@ 2026-05-29 10:39 ` Mark Harmstone
2026-06-01 1:52 ` Oliver Sang
0 siblings, 1 reply; 3+ messages in thread
From: Mark Harmstone @ 2026-05-29 10:39 UTC (permalink / raw)
To: kernel test robot; +Cc: oe-lkp, lkp, David Sterba, linux-btrfs
Thanks for this Oliver, mind if I offer some constructive criticism?
1) Your fio script has both "buffered=0" and "direct=0", which are
contradictory. It looks like in this case fio treats it as "direct=1",
but this probably should be explicit.
2) Recent versions of btrfs fallback to buffered I/O unless nodatasum is
set. You probably want to add:
mkdir /fs/sda1/nocow
chattr +C /fs/sda1/nocow
and then change the directory to be /fs/sda1/nocow.
3) My experience is that for FS performance testing, the only way to get
useful figures is by PCI passthrough of an NVMe drive. Otherwise the VM
overhead is enough to skew the figures.
Mark
On 29/05/2026 10.04 am, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 12.4% improvement of fio.write_iops on:
>
>
> commit: 551e510a97a487218d5f22d61d1a3388ef1171ac ("btrfs: don't force DIO writes to be serialized")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
>
> testcase: fio-basic
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
> runtime: 300s
> disk: 1HDD
> fs: btrfs
> nr_task: 1
> test_size: 128G
> rw: randwrite
> bs: 4k
> ioengine: falloc
> cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------+
> | testcase: change | fio-basic: fio.write_iops 10.9% improvement |
> | test parameters | bs=4k |
> | | cpufreq_governor=performance |
> | | disk=1HDD |
> | | fs=btrfs |
> | | ioengine=falloc |
> | | nr_task=1 |
> | | runtime=300s |
> | | rw=write |
> | | test_size=128G |
> +------------------+---------------------------------------------+
>
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20260529/202605291631.659bf248-lkp@intel.com
>
> =========================================================================================
> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
> 4k/gcc-14/performance/1HDD/btrfs/falloc/x86_64-rhel-9.4/1/debian-13-x86_64-20250902.cgz/300s/randwrite/lkp-icl-2sp9/128G/fio-basic
>
> commit:
> 964f569c14 ("btrfs: limit size of bios submitted from writeback")
> 551e510a97 ("btrfs: don't force DIO writes to be serialized")
>
> 964f569c14d7778c 551e510a97a487218d5f22d61d1
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 0.04 ± 5% -0.0 0.04 ± 5% fio.latency_4us%
> 5315 +12.4% 5976 fio.write_bw_MBps
> 510.67 -14.2% 438.00 fio.write_clat_90%_ns
> 515.00 -14.2% 442.00 fio.write_clat_95%_ns
> 522.67 -13.8% 450.67 fio.write_clat_99%_ns
> 496.01 -14.6% 423.39 fio.write_clat_mean_ns
> 1360660 +12.4% 1530010 fio.write_iops
> 1.67e+09 -10.0% 1.503e+09 cpuidle..time
> 1.48 +2.5% 1.51 iostat.cpu.system
> 42426 -6.0% 39867 ± 2% meminfo.AnonHugePages
> 165506 ± 3% -6.8% 154330 turbostat.IRQ
> 2392 +4.8% 2506 vmstat.system.cs
> 1945 +4.5% 2034 perf-stat.i.context-switches
> 1874 +4.1% 1949 perf-stat.ps.context-switches
> 1489031 ±110% +162.9% 3915040 ± 3% numa-meminfo.node0.FilePages
> 40746 ± 46% +72.2% 70177 ± 5% numa-meminfo.node0.KReclaimable
> 29851 ±122% +178.8% 83225 numa-meminfo.node0.Mapped
> 40746 ± 46% +72.2% 70177 ± 5% numa-meminfo.node0.SReclaimable
> 126444 ± 22% +29.1% 163254 ± 8% numa-meminfo.node0.Slab
> 1485610 ±110% +163.2% 3910117 ± 3% numa-meminfo.node0.Unevictable
> 2626454 ± 62% -92.4% 200365 ± 69% numa-meminfo.node1.FilePages
> 56220 ± 65% -95.9% 2321 ± 67% numa-meminfo.node1.Mapped
> 2622239 ± 62% -92.5% 197733 ± 70% numa-meminfo.node1.Unevictable
> 372258 ±110% +162.9% 978760 ± 3% numa-vmstat.node0.nr_file_pages
> 7462 ±122% +178.8% 20806 numa-vmstat.node0.nr_mapped
> 10186 ± 46% +72.2% 17544 ± 5% numa-vmstat.node0.nr_slab_reclaimable
> 371402 ±110% +163.2% 977529 ± 3% numa-vmstat.node0.nr_unevictable
> 371402 ±110% +163.2% 977529 ± 3% numa-vmstat.node0.nr_zone_unevictable
> 656614 ± 62% -92.4% 50091 ± 69% numa-vmstat.node1.nr_file_pages
> 14054 ± 65% -95.9% 580.52 ± 67% numa-vmstat.node1.nr_mapped
> 655559 ± 62% -92.5% 49433 ± 70% numa-vmstat.node1.nr_unevictable
> 655559 ± 62% -92.5% 49433 ± 70% numa-vmstat.node1.nr_zone_unevictable
> 80.52 ± 25% -58.3 22.22 ±147% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.get_signal
> 80.52 ± 25% -58.3 22.22 ±147% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
> 80.52 ± 25% -56.6 23.89 ±144% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
> 68.85 ± 49% -50.0 18.89 ±163% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
> 68.85 ± 49% -50.0 18.89 ±163% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
> 68.57 ± 49% -49.7 18.89 ±163% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
> 68.85 ± 49% -48.3 20.56 ±153% perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
> 70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
> 70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.do_syscall_64
> 70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 70.24 ± 51% -48.0 22.22 ±147% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 71.35 ± 48% -45.8 25.56 ±142% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 71.35 ± 48% -45.8 25.56 ±142% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 35.48 ± 64% -24.4 11.11 ±223% perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
> 22.82 ± 62% -18.4 4.44 ±147% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
> 23.10 ± 64% -15.9 7.22 ±169% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
> 23.10 ± 64% -15.9 7.22 ±169% perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
> 12.38 ±118% -12.4 0.00 perf-profile.calltrace.cycles-pp.lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range
> 15.99 ±131% -10.4 5.56 ±223% perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
> 12.38 ±143% -9.6 2.78 ±223% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> 12.38 ±143% -9.6 2.78 ±223% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> 12.38 ±143% -9.6 2.78 ±223% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> 11.71 ± 98% -7.3 4.44 ±147% perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
> 83.17 ± 23% -59.3 23.89 ±144% perf-profile.children.cycles-pp.__mmput
> 83.17 ± 23% -59.3 23.89 ±144% perf-profile.children.cycles-pp.exit_mmap
> 80.79 ± 24% -58.6 22.22 ±147% perf-profile.children.cycles-pp.arch_do_signal_or_restart
> 80.79 ± 24% -58.6 22.22 ±147% perf-profile.children.cycles-pp.get_signal
> 80.79 ± 24% -56.9 23.89 ±144% perf-profile.children.cycles-pp.do_exit
> 80.79 ± 24% -56.9 23.89 ±144% perf-profile.children.cycles-pp.do_group_exit
> 80.52 ± 25% -56.6 23.89 ±144% perf-profile.children.cycles-pp.exit_mm
> 75.95 ± 40% -50.4 25.56 ±142% perf-profile.children.cycles-pp.do_syscall_64
> 75.95 ± 40% -50.4 25.56 ±142% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 68.85 ± 49% -50.0 18.89 ±163% perf-profile.children.cycles-pp.unmap_page_range
> 68.85 ± 49% -50.0 18.89 ±163% perf-profile.children.cycles-pp.zap_pmd_range
> 68.85 ± 49% -50.0 18.89 ±163% perf-profile.children.cycles-pp.zap_pte_range
> 68.85 ± 49% -48.3 20.56 ±153% perf-profile.children.cycles-pp.unmap_vmas
> 70.24 ± 51% -48.0 22.22 ±147% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 35.20 ± 66% -26.9 8.33 ±223% perf-profile.children.cycles-pp.zap_present_ptes
> 23.10 ± 64% -15.9 7.22 ±169% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
> 23.10 ± 64% -15.9 7.22 ±169% perf-profile.children.cycles-pp.free_pages_and_swap_cache
> 23.10 ± 64% -15.9 7.22 ±169% perf-profile.children.cycles-pp.tlb_flush_mmu
> 19.33 ±102% -13.8 5.56 ±223% perf-profile.children.cycles-pp.folio_remove_rmap_ptes
> 12.38 ±143% -9.6 2.78 ±223% perf-profile.children.cycles-pp.kthread
> 12.38 ±143% -9.6 2.78 ±223% perf-profile.children.cycles-pp.ret_from_fork
> 12.38 ±143% -9.6 2.78 ±223% perf-profile.children.cycles-pp.ret_from_fork_asm
> 9.05 ±102% -9.0 0.00 perf-profile.children.cycles-pp.lruvec_stat_mod_folio
> 11.71 ± 98% -7.3 4.44 ±147% perf-profile.children.cycles-pp.folios_put_refs
> 15.60 ± 97% -15.6 0.00 perf-profile.self.cycles-pp.zap_present_ptes
> 13.61 ±107% -8.1 5.56 ±223% perf-profile.self.cycles-pp.folio_remove_rmap_ptes
> 10.28 ± 94% -6.9 3.33 ±223% perf-profile.self.cycles-pp.zap_pte_range
>
>
> ***************************************************************************************************
>
> =========================================================================================
> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
> 4k/gcc-14/performance/1HDD/btrfs/falloc/x86_64-rhel-9.4/1/debian-13-x86_64-20250902.cgz/300s/write/lkp-icl-2sp9/128G/fio-basic
>
> commit:
> 964f569c14 ("btrfs: limit size of bios submitted from writeback")
> 551e510a97 ("btrfs: don't force DIO writes to be serialized")
>
> 964f569c14d7778c 551e510a97a487218d5f22d61d1
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 0.04 ± 4% -0.0 0.04 ± 4% fio.latency_4us%
> 6156 +10.9% 6825 fio.write_bw_MBps
> 459.33 -13.5% 397.33 fio.write_clat_90%_ns
> 462.00 -13.7% 398.67 ± 2% fio.write_clat_95%_ns
> 466.67 -13.6% 403.33 ± 2% fio.write_clat_99%_ns
> 454.12 -14.1% 389.87 fio.write_clat_mean_ns
> 1575955 +10.9% 1747309 fio.write_iops
> 556943 ± 91% -62.8% 207326 ±184% sched_debug.cfs_rq:/.load.max
> 152355 ± 2% -7.7% 140620 ± 3% turbostat.IRQ
> 1.472e+09 ± 2% -9.2% 1.337e+09 cpuidle..time
> 99291 ± 3% -7.8% 91524 ± 4% cpuidle..usage
> 9.05 ±102% -9.0 0.00 perf-profile.calltrace.cycles-pp.free_pgtables.exit_mmap.__mmput.exit_mm.do_exit
> 9.05 ±102% -7.4 1.67 ±223% perf-profile.children.cycles-pp.free_pgtables
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linux-next:master] [btrfs] 551e510a97: fio.write_iops 12.4% improvement
2026-05-29 10:39 ` Mark Harmstone
@ 2026-06-01 1:52 ` Oliver Sang
0 siblings, 0 replies; 3+ messages in thread
From: Oliver Sang @ 2026-06-01 1:52 UTC (permalink / raw)
To: Mark Harmstone; +Cc: oe-lkp, lkp, David Sterba, linux-btrfs, oliver.sang
hi, Mark,
On Fri, May 29, 2026 at 11:39:39AM +0100, Mark Harmstone wrote:
> Thanks for this Oliver, mind if I offer some constructive criticism?
>
> 1) Your fio script has both "buffered=0" and "direct=0", which are
> contradictory. It looks like in this case fio treats it as "direct=1",
> but this probably should be explicit.
>
> 2) Recent versions of btrfs fallback to buffered I/O unless nodatasum is
> set. You probably want to add:
>
> mkdir /fs/sda1/nocow
> chattr +C /fs/sda1/nocow
>
> and then change the directory to be /fs/sda1/nocow.
>
> 3) My experience is that for FS performance testing, the only way to get
> useful figures is by PCI passthrough of an NVMe drive. Otherwise the VM
> overhead is enough to skew the figures.
thanks a lot for these education to us! we will investigate further how to
improve our tests. thanks again!
>
> Mark
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-01 1:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29 9:04 [linux-next:master] [btrfs] 551e510a97: fio.write_iops 12.4% improvement kernel test robot
2026-05-29 10:39 ` Mark Harmstone
2026-06-01 1:52 ` Oliver Sang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.