Greeting, FYI, we noticed a 330.6% improvement of fio.write_iops due to commit: commit: 4e8fc10115a6978060fe8a90f6a3a05463fa0660 ("[PATCHv3 1/1] ext4: Optimize file overwrites") url: https://github.com/0day-ci/linux/commits/Ritesh-Harjani/Optimize-ext4-file-overwrites-perf-improvement/20200918-131139 base: https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev in testcase: fio-basic on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory with following parameters: disk: 2pmem fs: ext4 mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: write bs: 4k ioengine: sync test_size: 200G cpufreq_governor: performance ucode: 0x5002f01 test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. test-url: https://github.com/axboe/fio Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode: 4k/gcc-9/performance/2pmem/ext4/sync/x86_64-rhel-8.3/dax/50%/debian-10.4-x86_64-20200603.cgz/200s/write/lkp-csl-2sp6/200G/fio-basic/tb/0x5002f01 commit: 27bc446e2d ("ext4: limit the length of per-inode prealloc list") 4e8fc10115 ("ext4: Optimize file overwrites") 27bc446e2def38db 4e8fc10115a6978060fe8a90f6a ---------------- --------------------------- %stddev %change %stddev \ | \ 0.12 ±106% -0.1 0.01 fio.latency_100us% 51.38 ± 23% -48.5 2.85 ± 20% fio.latency_20us% 0.01 +16.6 16.64 ± 28% fio.latency_2us% 0.24 ±135% +54.7 54.89 ± 3% fio.latency_4us% 32.62 ± 18% -31.7 0.91 ± 15% fio.latency_50us% 14780 ± 3% -9.4% 13390 fio.time.involuntary_context_switches 9299 -7.0% 8647 fio.time.system_time 228.71 ± 4% +281.9% 873.42 ± 6% fio.time.user_time 23448 -6.5% 21915 fio.time.voluntary_context_switches 5.426e+08 ± 5% +330.6% 2.337e+09 ± 6% fio.workload 10597 ± 5% +330.6% 45638 ± 6% fio.write_bw_MBps 26944 ± 8% -76.8% 6240 ± 9% fio.write_clat_90%_us 30368 ± 8% -72.0% 8512 ± 11% fio.write_clat_95%_us 38016 ± 9% -49.0% 19392 ± 4% fio.write_clat_99%_us 17448 ± 5% -77.9% 3855 ± 7% fio.write_clat_mean_us 11052 ± 32% -68.3% 3502 ± 10% fio.write_clat_stddev 2713004 ± 5% +330.6% 11683335 ± 6% fio.write_iops 13639680 ± 7% +26.6% 17267712 ± 5% meminfo.DirectMap2M 2704 ± 97% +131.9% 6269 ± 26% numa-meminfo.node0.PageTables 676.50 ± 96% +131.1% 1563 ± 26% numa-vmstat.node0.nr_page_table_pages 48.36 -6.8% 45.09 iostat.cpu.system 1.21 ± 4% +271.5% 4.51 ± 6% iostat.cpu.user 0.74 ± 2% +0.1 0.81 ± 5% mpstat.cpu.all.irq% 1.22 ± 4% +3.3 4.55 ± 6% mpstat.cpu.all.usr% 541348 +1.4% 548949 proc-vmstat.nr_file_pages 245833 +2.9% 252840 proc-vmstat.nr_unevictable 245833 +2.9% 252840 proc-vmstat.nr_zone_unevictable 695285 ± 20% -12.6% 607417 ± 17% proc-vmstat.pgfree 601976 ± 2% +22.0% 734594 ± 2% sched_debug.cpu.avg_idle.avg 1001923 +9.0% 1092207 ± 5% sched_debug.cpu.avg_idle.max 372963 -25.8% 276657 ± 6% sched_debug.cpu.avg_idle.stddev 22130 ± 17% +36.2% 30133 ± 14% sched_debug.cpu.nr_switches.max 3374 ± 18% +28.5% 4336 ± 10% sched_debug.cpu.nr_switches.stddev -47.00 -45.7% -25.50 sched_debug.cpu.nr_uninterruptible.min 2816 ± 21% +36.5% 3844 ± 13% sched_debug.cpu.sched_count.stddev 26.69 ± 13% -44.0% 14.94 ± 17% sched_debug.cpu.sched_goidle.min 1424 ± 21% +36.2% 1941 ± 13% sched_debug.cpu.sched_goidle.stddev 1411 ± 18% +31.9% 1861 ± 12% sched_debug.cpu.ttwu_count.stddev 15.42 ± 3% -82.8% 2.66 ± 8% perf-stat.i.MPKI 3.417e+09 ± 4% +239.7% 1.161e+10 ± 6% perf-stat.i.branch-instructions 0.72 -0.1 0.64 perf-stat.i.branch-miss-rate% 24883051 ± 3% +181.5% 70036819 ± 4% perf-stat.i.branch-misses 97563341 ± 12% -58.3% 40638724 ± 14% perf-stat.i.cache-misses 2.96e+08 ± 2% -48.4% 1.529e+08 ± 11% perf-stat.i.cache-references 7.06 ± 4% -70.7% 2.06 ± 5% perf-stat.i.cpi 1461 ± 14% +170.2% 3948 ± 19% perf-stat.i.cycles-between-cache-misses 6.17e+09 ± 4% +243.3% 2.119e+10 ± 6% perf-stat.i.dTLB-loads 0.00 ± 11% -0.0 0.00 ± 3% perf-stat.i.dTLB-store-miss-rate% 3.978e+09 ± 4% +257.1% 1.421e+10 ± 6% perf-stat.i.dTLB-stores 83.61 +7.2 90.82 perf-stat.i.iTLB-load-miss-rate% 25688726 ± 3% +126.2% 58108368 ± 5% perf-stat.i.iTLB-load-misses 4852201 +17.7% 5709608 ± 2% perf-stat.i.iTLB-loads 1.962e+10 ± 4% +243.4% 6.738e+10 ± 6% perf-stat.i.instructions 774.43 ± 2% +50.4% 1165 perf-stat.i.instructions-per-iTLB-miss 0.15 ± 4% +235.9% 0.51 ± 6% perf-stat.i.ipc 0.25 ± 2% +51.6% 0.37 ± 3% perf-stat.i.metric.K/sec 144.73 ± 4% +239.5% 491.37 ± 6% perf-stat.i.metric.M/sec 89.29 +2.6 91.93 perf-stat.i.node-load-miss-rate% 12691022 ± 8% -56.3% 5550053 ± 12% perf-stat.i.node-load-misses 1504953 ± 13% -64.4% 535348 ± 15% perf-stat.i.node-loads 9964107 ± 8% -58.8% 4108905 ± 17% perf-stat.i.node-store-misses 15.10 ± 3% -84.9% 2.28 ± 11% perf-stat.overall.MPKI 0.73 -0.1 0.60 perf-stat.overall.branch-miss-rate% 6.86 ± 4% -71.0% 1.99 ± 6% perf-stat.overall.cpi 1401 ± 13% +139.9% 3361 ± 14% perf-stat.overall.cycles-between-cache-misses 0.00 ± 30% -0.0 0.00 ± 45% perf-stat.overall.dTLB-load-miss-rate% 0.00 ± 22% -0.0 0.00 ± 4% perf-stat.overall.dTLB-store-miss-rate% 84.11 +6.9 91.02 perf-stat.overall.iTLB-load-miss-rate% 763.81 ± 2% +51.8% 1159 perf-stat.overall.instructions-per-iTLB-miss 0.15 ± 4% +245.0% 0.50 ± 6% perf-stat.overall.ipc 89.44 +1.8 91.23 perf-stat.overall.node-load-miss-rate% 7276 -20.3% 5801 perf-stat.overall.path-length 3.401e+09 ± 4% +239.6% 1.155e+10 ± 6% perf-stat.ps.branch-instructions 24776511 ± 3% +181.3% 69696643 ± 4% perf-stat.ps.branch-misses 97040508 ± 12% -58.3% 40436979 ± 14% perf-stat.ps.cache-misses 2.945e+08 ± 2% -48.3% 1.522e+08 ± 11% perf-stat.ps.cache-references 6.141e+09 ± 4% +243.2% 2.108e+10 ± 6% perf-stat.ps.dTLB-loads 3.959e+09 ± 4% +257.0% 1.414e+10 ± 6% perf-stat.ps.dTLB-stores 25562318 ± 3% +126.2% 57814503 ± 5% perf-stat.ps.iTLB-load-misses 4826722 +17.7% 5679789 ± 2% perf-stat.ps.iTLB-loads 1.953e+10 ± 4% +243.3% 6.704e+10 ± 6% perf-stat.ps.instructions 12624818 ± 8% -56.3% 5522769 ± 12% perf-stat.ps.node-load-misses 1497174 ± 13% -64.4% 532776 ± 15% perf-stat.ps.node-loads 9912289 ± 8% -58.8% 4087930 ± 17% perf-stat.ps.node-store-misses 3.947e+12 ± 4% +243.4% 1.355e+13 ± 6% perf-stat.total.instructions 290.75 ± 51% -78.1% 63.75 ±128% interrupts.CPU17.RES:Rescheduling_interrupts 6339 ± 25% -35.3% 4101 ± 52% interrupts.CPU19.NMI:Non-maskable_interrupts 6339 ± 25% -35.3% 4101 ± 52% interrupts.CPU19.PMI:Performance_monitoring_interrupts 166.00 ± 46% -91.6% 14.00 ± 72% interrupts.CPU2.RES:Rescheduling_interrupts 429.75 ± 2% +14.0% 490.00 ± 12% interrupts.CPU20.CAL:Function_call_interrupts 6339 ± 25% -35.3% 4100 ± 52% interrupts.CPU20.NMI:Non-maskable_interrupts 6339 ± 25% -35.3% 4100 ± 52% interrupts.CPU20.PMI:Performance_monitoring_interrupts 6338 ± 25% -31.1% 4364 ± 46% interrupts.CPU21.NMI:Non-maskable_interrupts 6338 ± 25% -31.1% 4364 ± 46% interrupts.CPU21.PMI:Performance_monitoring_interrupts 6339 ± 25% -50.8% 3121 ± 14% interrupts.CPU23.NMI:Non-maskable_interrupts 6339 ± 25% -50.8% 3121 ± 14% interrupts.CPU23.PMI:Performance_monitoring_interrupts 68.50 ± 54% +202.2% 207.00 interrupts.CPU24.RES:Rescheduling_interrupts 3328 ± 45% +76.5% 5876 ± 33% interrupts.CPU25.NMI:Non-maskable_interrupts 3328 ± 45% +76.5% 5876 ± 33% interrupts.CPU25.PMI:Performance_monitoring_interrupts 39.75 ± 79% +423.9% 208.25 ± 2% interrupts.CPU25.RES:Rescheduling_interrupts 1766 ±112% -75.2% 438.25 ± 4% interrupts.CPU27.CAL:Function_call_interrupts 82.75 ± 49% -64.0% 29.75 ±122% interrupts.CPU27.TLB:TLB_shootdowns 439.50 ± 2% +74.2% 765.50 ± 38% interrupts.CPU3.CAL:Function_call_interrupts 494.25 ± 5% -10.5% 442.25 ± 5% interrupts.CPU30.CAL:Function_call_interrupts 61.00 ±127% +230.7% 201.75 interrupts.CPU30.RES:Rescheduling_interrupts 56.50 ±140% +255.3% 200.75 interrupts.CPU31.RES:Rescheduling_interrupts 1633 ±123% -73.3% 435.50 ± 3% interrupts.CPU32.CAL:Function_call_interrupts 56.75 ±141% +252.4% 200.00 interrupts.CPU33.RES:Rescheduling_interrupts 56.75 ±139% +227.3% 185.75 ± 12% interrupts.CPU34.RES:Rescheduling_interrupts 56.50 ±142% +185.8% 161.50 ± 39% interrupts.CPU35.RES:Rescheduling_interrupts 79.75 ± 36% -56.4% 34.75 ± 91% interrupts.CPU36.TLB:TLB_shootdowns 65.25 ±117% +176.6% 180.50 ± 30% interrupts.CPU39.RES:Rescheduling_interrupts 78.50 ± 44% -54.1% 36.00 ± 83% interrupts.CPU39.TLB:TLB_shootdowns 62.25 ±120% +151.8% 156.75 ± 45% interrupts.CPU43.RES:Rescheduling_interrupts 86.00 ± 45% -54.4% 39.25 ± 97% interrupts.CPU43.TLB:TLB_shootdowns 487.50 ± 10% -10.8% 434.75 ± 3% interrupts.CPU44.CAL:Function_call_interrupts 93.00 ± 46% -64.5% 33.00 ±119% interrupts.CPU46.TLB:TLB_shootdowns 7330 ± 12% -41.4% 4293 ± 33% interrupts.CPU5.NMI:Non-maskable_interrupts 7330 ± 12% -41.4% 4293 ± 33% interrupts.CPU5.PMI:Performance_monitoring_interrupts 169.25 ± 36% -90.8% 15.50 ± 71% interrupts.CPU5.RES:Rescheduling_interrupts 3285 ± 45% +92.3% 6318 ± 25% interrupts.CPU57.NMI:Non-maskable_interrupts 3285 ± 45% +92.3% 6318 ± 25% interrupts.CPU57.PMI:Performance_monitoring_interrupts 7323 ± 12% -51.2% 3572 ± 34% interrupts.CPU6.NMI:Non-maskable_interrupts 7323 ± 12% -51.2% 3572 ± 34% interrupts.CPU6.PMI:Performance_monitoring_interrupts 32.50 ± 78% +580.0% 221.00 ±125% interrupts.CPU63.TLB:TLB_shootdowns 7323 ± 12% -41.5% 4286 ± 33% interrupts.CPU7.NMI:Non-maskable_interrupts 7323 ± 12% -41.5% 4286 ± 33% interrupts.CPU7.PMI:Performance_monitoring_interrupts 175.50 ± 27% -80.3% 34.50 ± 37% interrupts.CPU72.RES:Rescheduling_interrupts 93.25 ± 45% -57.1% 40.00 ±115% interrupts.CPU72.TLB:TLB_shootdowns 7868 -45.2% 4311 ± 32% interrupts.CPU73.NMI:Non-maskable_interrupts 7868 -45.2% 4311 ± 32% interrupts.CPU73.PMI:Performance_monitoring_interrupts 7330 ± 12% -41.4% 4297 ± 33% interrupts.CPU75.NMI:Non-maskable_interrupts 7330 ± 12% -41.4% 4297 ± 33% interrupts.CPU75.PMI:Performance_monitoring_interrupts 163.50 ± 41% -84.9% 24.75 ±127% interrupts.CPU77.RES:Rescheduling_interrupts 7324 ± 12% -41.4% 4294 ± 33% interrupts.CPU78.NMI:Non-maskable_interrupts 7324 ± 12% -41.4% 4294 ± 33% interrupts.CPU78.PMI:Performance_monitoring_interrupts 161.25 ± 45% -91.5% 13.75 ±109% interrupts.CPU80.RES:Rescheduling_interrupts 7325 ± 12% -41.5% 4287 ± 33% interrupts.CPU81.NMI:Non-maskable_interrupts 7325 ± 12% -41.5% 4287 ± 33% interrupts.CPU81.PMI:Performance_monitoring_interrupts 95.00 ± 50% -59.7% 38.25 ±117% interrupts.CPU92.TLB:TLB_shootdowns 8991 ±108% +161.3% 23491 ± 19% softirqs.CPU2.SCHED 67870 ± 5% +8.4% 73546 ± 2% softirqs.CPU2.TIMER 23244 ± 25% -88.7% 2626 softirqs.CPU24.SCHED 83405 ± 17% -23.4% 63886 ± 2% softirqs.CPU24.TIMER 23963 ± 12% -88.4% 2784 ± 2% softirqs.CPU25.SCHED 83623 ± 19% -23.5% 63968 ± 2% softirqs.CPU25.TIMER 4276 ± 5% +97.6% 8448 ± 13% softirqs.CPU26.RCU 14129 ± 74% -81.4% 2631 ± 4% softirqs.CPU26.SCHED 17203 ± 53% -70.0% 5163 ± 89% softirqs.CPU27.SCHED 70966 ± 5% -10.4% 63583 ± 5% softirqs.CPU27.TIMER 19121 ± 47% -74.6% 4863 ± 88% softirqs.CPU28.SCHED 72354 ± 6% -10.4% 64858 ± 2% softirqs.CPU29.TIMER 9275 ±101% +151.3% 23309 ± 19% softirqs.CPU3.SCHED 19928 ± 46% -84.7% 3042 ± 7% softirqs.CPU30.SCHED 72106 ± 7% -11.8% 63632 ± 2% softirqs.CPU30.TIMER 19845 ± 45% -84.7% 3030 ± 6% softirqs.CPU31.SCHED 72345 ± 6% -10.8% 64523 softirqs.CPU31.TIMER 19559 ± 47% -84.2% 3094 ± 8% softirqs.CPU32.SCHED 19689 ± 47% -83.0% 3352 ± 2% softirqs.CPU33.SCHED 71873 ± 7% -9.4% 65131 softirqs.CPU33.TIMER 16286 ± 48% -63.6% 5928 ± 76% softirqs.CPU34.SCHED 11784 ± 76% +118.7% 25776 softirqs.CPU4.SCHED 70606 ± 5% -9.8% 63713 softirqs.CPU48.TIMER 71122 ± 4% -10.2% 63890 ± 5% softirqs.CPU49.TIMER 8863 ±108% +190.0% 25702 softirqs.CPU5.SCHED 20026 ± 49% -87.1% 2587 ± 5% softirqs.CPU50.SCHED 70832 ± 4% -10.7% 63286 softirqs.CPU50.TIMER 18874 ± 50% -86.1% 2631 ± 4% softirqs.CPU51.SCHED 71694 ± 5% -13.7% 61847 ± 3% softirqs.CPU51.TIMER 17403 ± 56% -85.3% 2560 softirqs.CPU52.SCHED 71831 ± 8% -11.0% 63942 ± 3% softirqs.CPU52.TIMER 20860 ± 49% -87.1% 2689 ± 2% softirqs.CPU53.SCHED 81014 ± 19% -23.0% 62345 ± 2% softirqs.CPU53.TIMER 20180 ± 50% -87.7% 2480 ± 9% softirqs.CPU54.SCHED 71917 ± 5% -12.3% 63071 softirqs.CPU54.TIMER 74057 ± 12% -16.4% 61946 ± 2% softirqs.CPU55.TIMER 20135 ± 50% -86.8% 2667 ± 4% softirqs.CPU56.SCHED 73377 ± 7% -13.4% 63523 ± 3% softirqs.CPU56.TIMER 23019 ± 19% -64.3% 8226 ±118% softirqs.CPU57.SCHED 75540 ± 5% -14.6% 64485 ± 4% softirqs.CPU57.TIMER 20267 ± 49% -59.4% 8236 ±118% softirqs.CPU58.SCHED 72755 ± 7% -11.1% 64699 ± 3% softirqs.CPU58.TIMER 72871 ± 7% -10.9% 64896 ± 4% softirqs.CPU59.TIMER 8781 ±108% +192.7% 25703 softirqs.CPU6.SCHED 72683 ± 7% -10.9% 64778 ± 4% softirqs.CPU60.TIMER 72665 ± 8% -11.1% 64612 ± 4% softirqs.CPU61.TIMER 72308 ± 5% -10.1% 64991 ± 6% softirqs.CPU65.TIMER 20301 ± 49% -58.5% 8419 ±118% softirqs.CPU66.SCHED 11380 ± 79% +123.7% 25453 softirqs.CPU7.SCHED 4027 ± 5% +111.8% 8530 ± 32% softirqs.CPU71.RCU 5823 ± 96% +357.6% 26649 softirqs.CPU72.SCHED 2461 ± 12% +952.7% 25914 softirqs.CPU73.SCHED 8475 ±117% +176.7% 23452 ± 20% softirqs.CPU75.SCHED 8462 ±116% +178.9% 23601 ± 19% softirqs.CPU76.SCHED 8459 ±117% +211.7% 26366 ± 2% softirqs.CPU77.SCHED 8511 ±117% +205.5% 26002 ± 2% softirqs.CPU79.SCHED 8854 ±105% +186.2% 25341 ± 2% softirqs.CPU8.SCHED 8450 ±116% +215.1% 26629 ± 2% softirqs.CPU80.SCHED 8496 ±117% +206.5% 26038 softirqs.CPU81.SCHED 4144 ± 6% +83.5% 7603 ± 21% softirqs.CPU82.RCU 8429 ±117% +179.7% 23575 ± 18% softirqs.CPU82.SCHED 8393 ±117% +138.6% 20028 ± 30% softirqs.CPU84.SCHED 8422 ±116% +140.8% 20281 ± 28% softirqs.CPU92.SCHED 4021 ± 7% +93.4% 7778 ± 29% softirqs.CPU95.RCU 415214 +63.4% 678631 ± 6% softirqs.RCU 38.06 ± 7% -38.1 0.00 perf-profile.calltrace.cycles-pp.__ext4_journal_start_sb.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter 36.28 ± 7% -36.3 0.00 perf-profile.calltrace.cycles-pp.jbd2__journal_start.__ext4_journal_start_sb.ext4_iomap_begin.iomap_apply.dax_iomap_rw 36.07 ± 7% -36.1 0.00 perf-profile.calltrace.cycles-pp.start_this_handle.jbd2__journal_start.__ext4_journal_start_sb.ext4_iomap_begin.iomap_apply 63.15 ± 7% -31.9 31.29 ± 12% perf-profile.calltrace.cycles-pp.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter.new_sync_write 11.15 ± 9% -11.1 0.00 perf-profile.calltrace.cycles-pp.__ext4_journal_stop.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter 10.95 ± 9% -11.0 0.00 perf-profile.calltrace.cycles-pp.jbd2_journal_stop.__ext4_journal_stop.ext4_iomap_begin.iomap_apply.dax_iomap_rw 8.81 ± 7% -8.8 0.00 perf-profile.calltrace.cycles-pp.stop_this_handle.jbd2_journal_stop.__ext4_journal_stop.ext4_iomap_begin.iomap_apply 8.49 ± 6% -8.5 0.00 perf-profile.calltrace.cycles-pp.add_transaction_credits.start_this_handle.jbd2__journal_start.__ext4_journal_start_sb.ext4_iomap_begin 5.93 ± 6% -5.9 0.00 perf-profile.calltrace.cycles-pp._raw_read_lock.start_this_handle.jbd2__journal_start.__ext4_journal_start_sb.ext4_iomap_begin 0.99 ± 9% +0.4 1.44 ± 19% perf-profile.calltrace.cycles-pp.ext4_write_checks.ext4_file_write_iter.new_sync_write.vfs_write.ksys_write 0.00 +1.0 0.96 ± 17% perf-profile.calltrace.cycles-pp.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin.iomap_apply.dax_iomap_rw 0.00 +1.1 1.10 ± 20% perf-profile.calltrace.cycles-pp.__check_block_validity.ext4_map_blocks.ext4_iomap_begin.iomap_apply.dax_iomap_rw 0.00 +2.2 2.19 ± 17% perf-profile.calltrace.cycles-pp.ext4_map_blocks.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter 1.94 ± 16% +6.6 8.49 ± 13% perf-profile.calltrace.cycles-pp.__copy_user_nocache.__copy_user_flushcache._copy_from_iter_flushcache.dax_iomap_actor.iomap_apply 1.95 ± 16% +6.6 8.54 ± 13% perf-profile.calltrace.cycles-pp.__copy_user_flushcache._copy_from_iter_flushcache.dax_iomap_actor.iomap_apply.dax_iomap_rw 1.99 ± 16% +6.7 8.70 ± 13% perf-profile.calltrace.cycles-pp._copy_from_iter_flushcache.dax_iomap_actor.iomap_apply.dax_iomap_rw.ext4_file_write_iter 7.86 ± 11% +12.8 20.70 ± 13% perf-profile.calltrace.cycles-pp._raw_read_lock.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin.iomap_apply 1.73 ± 15% +13.7 15.42 ± 27% perf-profile.calltrace.cycles-pp.__srcu_read_unlock.dax_iomap_actor.iomap_apply.dax_iomap_rw.ext4_file_write_iter 12.86 ± 7% +14.8 27.69 ± 13% perf-profile.calltrace.cycles-pp.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin.iomap_apply.dax_iomap_rw 13.14 ± 7% +15.7 28.81 ± 13% perf-profile.calltrace.cycles-pp.ext4_set_iomap.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter 3.87 ± 14% +20.9 24.76 ± 20% perf-profile.calltrace.cycles-pp.dax_iomap_actor.iomap_apply.dax_iomap_rw.ext4_file_write_iter.new_sync_write 38.74 ± 7% -38.1 0.65 ± 8% perf-profile.children.cycles-pp.__ext4_journal_start_sb 36.93 ± 7% -36.3 0.61 ± 7% perf-profile.children.cycles-pp.jbd2__journal_start 36.73 ± 7% -36.1 0.60 ± 7% perf-profile.children.cycles-pp.start_this_handle 63.15 ± 7% -31.9 31.30 ± 12% perf-profile.children.cycles-pp.ext4_iomap_begin 11.21 ± 9% -11.2 0.01 ±173% perf-profile.children.cycles-pp.__ext4_journal_stop 11.01 ± 9% -11.0 0.01 ±173% perf-profile.children.cycles-pp.jbd2_journal_stop 8.83 ± 7% -8.8 0.00 perf-profile.children.cycles-pp.stop_this_handle 8.64 ± 7% -8.5 0.14 ± 8% perf-profile.children.cycles-pp.add_transaction_credits 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.timestamp_truncate 0.00 +0.1 0.06 ± 15% perf-profile.children.cycles-pp.pmem_dax_direct_access 0.00 +0.1 0.06 ± 14% perf-profile.children.cycles-pp.fsnotify_parent 0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.file_modified 0.00 +0.1 0.07 ± 12% perf-profile.children.cycles-pp.aa_file_perm 0.00 +0.1 0.07 ± 12% perf-profile.children.cycles-pp.apparmor_file_permission 0.00 +0.1 0.07 ± 15% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 0.00 +0.1 0.08 ± 10% perf-profile.children.cycles-pp.__pmem_direct_access 0.00 +0.1 0.09 ± 9% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax 0.00 +0.1 0.09 ± 7% perf-profile.children.cycles-pp.__might_sleep 0.00 +0.1 0.09 ± 13% perf-profile.children.cycles-pp._cond_resched 0.00 +0.1 0.10 ± 12% perf-profile.children.cycles-pp.___might_sleep 0.00 +0.1 0.12 ± 12% perf-profile.children.cycles-pp.fsnotify 0.04 ± 57% +0.1 0.18 ± 7% perf-profile.children.cycles-pp.__fdget_pos 0.00 +0.1 0.14 ± 7% perf-profile.children.cycles-pp.__fget_light 0.00 +0.2 0.15 ± 10% perf-profile.children.cycles-pp.up_write 0.01 ±173% +0.2 0.17 ± 6% perf-profile.children.cycles-pp.current_time 0.00 +0.2 0.16 ± 11% perf-profile.children.cycles-pp.dax_direct_access 0.06 ± 7% +0.2 0.23 ± 11% perf-profile.children.cycles-pp.__sb_start_write 0.00 +0.2 0.18 ± 72% perf-profile.children.cycles-pp.generic_write_checks 0.04 ± 57% +0.2 0.22 ± 8% perf-profile.children.cycles-pp.__srcu_read_lock 0.06 ± 7% +0.2 0.26 ± 11% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.06 +0.2 0.26 ± 14% perf-profile.children.cycles-pp.common_file_perm 0.05 ± 9% +0.2 0.28 ± 11% perf-profile.children.cycles-pp.down_write 0.00 +0.2 0.23 ± 60% perf-profile.children.cycles-pp.ext4_generic_write_checks 0.09 ± 5% +0.3 0.34 ± 13% perf-profile.children.cycles-pp.syscall_return_via_sysret 0.09 ± 5% +0.3 0.37 ± 14% perf-profile.children.cycles-pp.security_file_permission 0.10 ± 8% +0.4 0.54 ± 25% perf-profile.children.cycles-pp.ext4_inode_block_valid 0.99 ± 9% +0.4 1.44 ± 19% perf-profile.children.cycles-pp.ext4_write_checks 0.04 ± 57% +0.5 0.51 ± 31% perf-profile.children.cycles-pp.percpu_counter_add_batch 0.12 ±173% +0.5 0.65 ± 42% perf-profile.children.cycles-pp.start_kernel 0.17 ± 11% +0.8 0.96 ± 17% perf-profile.children.cycles-pp.ext4_es_lookup_extent 0.19 ± 14% +0.9 1.11 ± 20% perf-profile.children.cycles-pp.__check_block_validity 0.39 ± 12% +1.8 2.20 ± 17% perf-profile.children.cycles-pp.ext4_map_blocks 1.94 ± 16% +6.6 8.50 ± 13% perf-profile.children.cycles-pp.__copy_user_nocache 1.95 ± 16% +6.6 8.54 ± 13% perf-profile.children.cycles-pp.__copy_user_flushcache 1.99 ± 16% +6.7 8.70 ± 13% perf-profile.children.cycles-pp._copy_from_iter_flushcache 13.96 ± 9% +7.1 21.04 ± 13% perf-profile.children.cycles-pp._raw_read_lock 1.73 ± 15% +13.7 15.43 ± 27% perf-profile.children.cycles-pp.__srcu_read_unlock 12.87 ± 7% +14.8 27.70 ± 13% perf-profile.children.cycles-pp.jbd2_transaction_committed 13.15 ± 7% +15.7 28.82 ± 13% perf-profile.children.cycles-pp.ext4_set_iomap 3.88 ± 14% +20.9 24.78 ± 20% perf-profile.children.cycles-pp.dax_iomap_actor 21.95 ± 7% -21.6 0.35 ± 8% perf-profile.self.cycles-pp.start_this_handle 8.79 ± 7% -8.8 0.00 perf-profile.self.cycles-pp.stop_this_handle 8.60 ± 7% -8.5 0.14 ± 8% perf-profile.self.cycles-pp.add_transaction_credits 0.00 +0.1 0.05 ± 8% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax 0.00 +0.1 0.06 ± 9% perf-profile.self.cycles-pp.current_time 0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.aa_file_perm 0.00 +0.1 0.06 ± 20% perf-profile.self.cycles-pp.apparmor_file_permission 0.00 +0.1 0.07 ± 20% perf-profile.self.cycles-pp.generic_write_checks 0.00 +0.1 0.07 ± 15% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 0.00 +0.1 0.08 ± 6% perf-profile.self.cycles-pp.__might_sleep 0.00 +0.1 0.08 ± 10% perf-profile.self.cycles-pp.__pmem_direct_access 0.00 +0.1 0.08 ± 13% perf-profile.self.cycles-pp.__sb_start_write 0.00 +0.1 0.09 ± 13% perf-profile.self.cycles-pp.ksys_write 0.00 +0.1 0.10 ± 12% perf-profile.self.cycles-pp.___might_sleep 0.00 +0.1 0.11 ± 16% perf-profile.self.cycles-pp.dax_iomap_rw 0.00 +0.1 0.11 ± 11% perf-profile.self.cycles-pp.fsnotify 0.00 +0.1 0.12 ± 67% perf-profile.self.cycles-pp.file_update_time 0.00 +0.1 0.13 ± 8% perf-profile.self.cycles-pp.__fget_light 0.00 +0.1 0.13 ± 9% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.00 +0.1 0.14 ± 15% perf-profile.self.cycles-pp.ext4_map_blocks 0.00 +0.2 0.15 ± 12% perf-profile.self.cycles-pp._copy_from_iter_flushcache 0.04 ± 57% +0.2 0.19 ± 15% perf-profile.self.cycles-pp.common_file_perm 0.00 +0.2 0.15 ± 10% perf-profile.self.cycles-pp.up_write 0.00 +0.2 0.17 ± 10% perf-profile.self.cycles-pp.down_write 0.04 ± 57% +0.2 0.21 ± 10% perf-profile.self.cycles-pp.dax_iomap_actor 0.01 ±173% +0.2 0.20 ± 11% perf-profile.self.cycles-pp.vfs_write 0.00 +0.2 0.18 ± 15% perf-profile.self.cycles-pp.do_syscall_64 0.08 ± 5% +0.2 0.28 ± 8% perf-profile.self.cycles-pp.ext4_iomap_begin 0.06 ± 15% +0.2 0.25 ± 11% perf-profile.self.cycles-pp.ext4_es_lookup_extent 0.06 ± 7% +0.2 0.26 ± 11% perf-profile.self.cycles-pp.entry_SYSCALL_64 0.01 ±173% +0.2 0.22 ± 10% perf-profile.self.cycles-pp.__srcu_read_lock 0.09 ± 5% +0.3 0.34 ± 13% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.00 +0.3 0.31 ± 80% perf-profile.self.cycles-pp.new_sync_write 0.11 ± 7% +0.3 0.45 ± 9% perf-profile.self.cycles-pp.iomap_apply 0.04 ± 57% +0.4 0.47 ± 32% perf-profile.self.cycles-pp.percpu_counter_add_batch 0.10 ± 8% +0.4 0.53 ± 25% perf-profile.self.cycles-pp.ext4_inode_block_valid 0.25 ± 12% +0.5 0.70 ± 25% perf-profile.self.cycles-pp.ext4_file_write_iter 0.09 ± 27% +0.5 0.56 ± 21% perf-profile.self.cycles-pp.__check_block_validity 0.27 ± 18% +0.8 1.11 ± 28% perf-profile.self.cycles-pp.ext4_set_iomap 4.99 ± 6% +2.0 6.95 ± 14% perf-profile.self.cycles-pp.jbd2_transaction_committed 1.93 ± 16% +6.5 8.46 ± 13% perf-profile.self.cycles-pp.__copy_user_nocache 13.90 ± 9% +7.0 20.92 ± 13% perf-profile.self.cycles-pp._raw_read_lock 1.73 ± 15% +13.6 15.35 ± 27% perf-profile.self.cycles-pp.__srcu_read_unlock fio.write_bw_MBps 60000 +-------------------------------------------------------------------+ 55000 |-+ O | | O O O | 50000 |-+ O O O O O O | 45000 |-+ O O O O O O O | 40000 |-O O O O O | 35000 |-+ | | | 30000 |-+ | 25000 |-+ | 20000 |-+ | 15000 |-+ | |.+..+.+.+.+..+.+.+.+..+.+.+. .+. .+..+.+.+.+..+.+.+. .+. .+.| 10000 |-+ +. + +..+ +.+. | 5000 +-------------------------------------------------------------------+ fio.write_iops 1.6e+07 +-----------------------------------------------------------------+ | O | 1.4e+07 |-+ | | O O O O O | 1.2e+07 |-+ O O O O O | | O O O O O O O O O O | 1e+07 |-+ O | | | 8e+06 |-+ | | | 6e+06 |-+ | | | 4e+06 |-+ | |.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.| 2e+06 +-----------------------------------------------------------------+ fio.write_clat_mean_us 20000 +-------------------------------------------------------------------+ | +.+.. | 18000 |-+ .+..+.+.+.. .+..+. + | 16000 |.+..+. .+.+..+. .+.+.. .+.+ +.+.+.+..+.+.+ + +.| | + + + | 14000 |-+ | 12000 |-+ | | | 10000 |-+ | 8000 |-+ | | | 6000 |-+ | 4000 |-O O O O O O O O O | | O O O O O O O O O O O O O | 2000 +-------------------------------------------------------------------+ fio.write_clat_90__us 35000 +-------------------------------------------------------------------+ | | 30000 |-+ + .+. .+.. .+ + | |.+.. +. .+ : + .+. .+. + .+. +.+.+.+. : : + .+.| 25000 |-+ +. + +. + : +..+ + + +. .. : : +. | | + + + + | 20000 |-+ | | | 15000 |-+ | | | 10000 |-+ | | O O O O O O O O O O | 5000 |-+ O O O O O O O O O O O O | | | 0 +-------------------------------------------------------------------+ fio.write_clat_95__us 40000 +-------------------------------------------------------------------+ | | 35000 |-+ .+. .+.. + | | +. +. + +. + +. .+ :+ | 30000 |.+.. : +..+ : +.. + + + +.+.+ + +.+.+. : : +..+.| | +. : + : + + + + : : | 25000 |-+ + + + + | | | 20000 |-+ | | | 15000 |-+ | | | 10000 |-+ O O O O O O O O | | O O O O O O O O O O O O O | 5000 +-------------------------------------------------------------------+ fio.latency_4us_ 70 +----------------------------------------------------------------------+ | O | 60 |-+ O | | O O O O O O O O O | 50 |-+ O O O O O O | | O O O O | 40 |-+ O | | | 30 |-+ | | | 20 |-+ | | | 10 |-+ | | | 0 +----------------------------------------------------------------------+ fio.latency_50us_ 45 +----------------------------------------------------------------------+ | + | 40 |-+ .+ :: | 35 |-+ + + +.+..+.+ .+ : : : .+ | | + :+ + :: +. .. : +. .+. : : +. :| 30 |+++ : + + + : : : + : : + : : :| 25 |-+ + : + + : +.. : +..+.+ : : : | | +.: + + + : : | 20 |-+ + + + | 15 |-+ | | | 10 |-+ | 5 |-+ | | | 0 +----------------------------------------------------------------------+ fio.workload 3e+09 +-----------------------------------------------------------------+ | | | O O O | 2.5e+09 |-+ O O O O O | | O O O O O O | | O O O O O O O | 2e+09 |-+ | | | 1.5e+09 |-+ | | | | | 1e+09 |-+ | | | |. .+..+.+.+. .+..+. .+. .+.. | 5e+08 +-----------------------------------------------------------------+ fio.time.user_time 1100 +--------------------------------------------------------------------+ | O | 1000 |-+ O O O O | 900 |-+ O O O O O | | O O O O O O | 800 |-O O O O O | 700 |-+ | | | 600 |-+ | 500 |-+ | | | 400 |-+ + | 300 |-+.. + | |.+ +.+..+.+.+.+..+.+.+..+. .+.+..+.+.+..+.+. .+.+. .+.| 200 +--------------------------------------------------------------------+ fio.time.system_time 9400 +--------------------------------------------------------------------+ 9300 |-+ .+.+..+.+. .+.. .+.+.. | |.+.. +.+..+.+.+.+..+.+.+..+ +.+..+.+.+..+.+ +.+ +.| 9200 |-+ + | 9100 |-+ + | | | 9000 |-+ | 8900 |-+ | 8800 |-+ | | O | 8700 |-O O O O O O O O O O | 8600 |-+ O O O O O | | O O O O | 8500 |-+ O | 8400 +--------------------------------------------------------------------+ fio.time.voluntary_context_switches 24500 +-------------------------------------------------------------------+ | + + | 24000 |-+ + : : + | |: + + +. + : : + + | |: + + + .. +.+. .+. .+ + +. .+. + + | 23500 |-+ + +.+ +..+ + +.+.+. +.+.+..+ +.+..+.| | | 23000 |-+ | | | 22500 |-+ | | O O | | O O | 22000 |-+ O O O O O O O O | | O O O O O O O O O O | 21500 +-------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen