All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yin Fengwei <fengwei.yin@intel.com>
To: lkp@lists.01.org
Subject: Re: [xfs] 2bf1ec0ff0: stress-ng.mknod.ops_per_sec -45.4% regression
Date: Thu, 09 Sep 2021 15:07:35 +0800	[thread overview]
Message-ID: <ec35eb2e-a488-ecfe-22c6-bf86d7eef145@intel.com> (raw)
In-Reply-To: <a149d14e-48e0-bad9-4283-e560f03a941f@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 30642 bytes --]

Hi Dave,

On 9/3/2021 3:43 PM, Xing Zhengjun wrote:
> Hi Dave,
> 
>    Do you have time to look at this? Thanks.
> 
> On 8/4/2021 10:38 PM, kernel test robot wrote:
>>
>>
>> Greeting,
>>
>> FYI, we noticed a -45.4% regression of stress-ng.mknod.ops_per_sec due to commit:
>>
>>
>> commit: 2bf1ec0ff067ff8f692d261b29c713f3583f7e2a ("xfs: log forces imply data device cache flushes")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

We noticed the commit 2bf1ec0ff067ff8f692d261b29c713f3583f7e2a is actually a
fix to the race introduced by commit eef983ffeae7a1cdde8c3338155ae2dd74b8621b.

Then we did a test for stress-ng.mknod.ops_per_sec between commit:
  eef983ffeae7a1cdde8c3338155ae2dd74b8621b
and its parent commit:
  3468bb1ca6e8840789e13c7b9d8b0c556b4fbe79

We got:
commit: 
  3468bb1ca6e8840789e13c7b9d8b0c556b4fbe79
  eef983ffeae7a1cdde8c3338155ae2dd74b8621b

3468bb1ca6e88407 eef983ffeae7a1cdde8c3338155 
---------------- --------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
        112:1          456%         117:1     dmesg.timestamp:last
        103:1          553%         109:1     kmsg.timestamp:last
          1:1            0%           1:1     pmeter.pmeter.fail
          1:1            0%           1:1     stress-ng.mknod.pass
         %stddev     %change         %stddev
             \          |                \  
     47979           +80.6%      86655        stress-ng.mknod.ops
    799.29           +80.6%       1443        stress-ng.mknod.ops_per_sec

It's like commit eef983ffeae7a1cdde8c3338155ae2dd74b8621b bought 80%
performance gain. But it introduced race issue.

Commit 2bf1ec0ff067ff8f692d261b29c713f3583f7e2a fixed the race issue
and lost about 50% performance gain. Suppose we still have 30%
performance gain with race issue fixed. So it's still a good change.

We will close this report.


Regards
Yin, Fengwei

>>
>>
>> in testcase: stress-ng
>> on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
>> with following parameters:
>>
>>     nr_threads: 10%
>>     disk: 1HDD
>>     testtime: 60s
>>     fs: xfs
>>     class: filesystem
>>     test: mknod
>>     cpufreq_governor: performance
>>     ucode: 0x5003006
>>
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> To reproduce:
>>
>>          git clone https://github.com/intel/lkp-tests.git
>>          cd lkp-tests
>>          bin/lkp install                job.yaml  # job file is attached in this email
>>          bin/lkp split-job --compatible job.yaml  # generate the yaml file for lkp run
>>          bin/lkp run                    generated-yaml-file
>>
>> =========================================================================================
>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
>>    filesystem/gcc-9/performance/1HDD/xfs/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/mknod/stress-ng/60s/0x5003006
>>
>> commit:
>>    45eddb4140 ("xfs: factor out forced iclog flushes")
>>    2bf1ec0ff0 ("xfs: log forces imply data device cache flushes")
>>
>> 45eddb414047c366 2bf1ec0ff067ff8f692d261b29c
>> ---------------- ---------------------------
>>           %stddev     %change         %stddev
>>               \          |                \
>>       87135           -45.4%      47612        stress-ng.mknod.ops
>>        1451           -45.4%     793.24        stress-ng.mknod.ops_per_sec
>>       23.00 ±  7%     -51.4%      11.17 ± 12%  stress-ng.time.percent_of_cpu_this_job_got
>>       13.80 ±  7%     -50.7%       6.80 ±  8%  stress-ng.time.system_time
>>      211458 ±  2%     -62.4%      79605 ± 14%  stress-ng.time.voluntary_context_switches
>>        0.30 ±  2%      -0.1        0.18        mpstat.cpu.all.sys%
>>       13993 ±  2%     -38.1%       8659        softirqs.BLOCK
>>       27747            -2.3%      27103        proc-vmstat.nr_slab_reclaimable
>>       76223 ±  3%     -57.3%      32548 ±  5%  proc-vmstat.pgpgout
>>        1170 ±  3%     -57.3%     500.17 ±  5%  vmstat.io.bo
>>       10752           -47.7%       5624 ±  6%  vmstat.system.cs
>>       32820           -34.5%      21499        slabinfo.kmalloc-rcl-64.active_objs
>>      512.67           -34.6%     335.50        slabinfo.kmalloc-rcl-64.active_slabs
>>       32836           -34.5%      21508        slabinfo.kmalloc-rcl-64.num_objs
>>      512.67           -34.6%     335.50        slabinfo.kmalloc-rcl-64.num_slabs
>>       24913 ±  2%     -42.8%      14251        interrupts.315:PCI-MSI.376832-edge.ahci[0000:00:17.0]
>>       71844 ±  6%     -19.9%      57564 ±  3%  interrupts.CAL:Function_call_interrupts
>>      206.50 ± 37%     -42.9%     118.00 ± 41%  interrupts.CPU1.NMI:Non-maskable_interrupts
>>      206.50 ± 37%     -42.9%     118.00 ± 41%  interrupts.CPU1.PMI:Performance_monitoring_interrupts
>>      173.17 ± 26%     -31.7%     118.33 ± 38%  interrupts.CPU2.NMI:Non-maskable_interrupts
>>      173.17 ± 26%     -31.7%     118.33 ± 38%  interrupts.CPU2.PMI:Performance_monitoring_interrupts
>>      189.83 ± 34%     -40.6%     112.83 ± 25%  interrupts.CPU49.NMI:Non-maskable_interrupts
>>      189.83 ± 34%     -40.6%     112.83 ± 25%  interrupts.CPU49.PMI:Performance_monitoring_interrupts
>>      181.67 ± 25%     -38.3%     112.17 ± 24%  interrupts.CPU50.NMI:Non-maskable_interrupts
>>      181.67 ± 25%     -38.3%     112.17 ± 24%  interrupts.CPU50.PMI:Performance_monitoring_interrupts
>>        1895 ± 16%     -32.8%       1274 ± 20%  interrupts.RES:Rescheduling_interrupts
>>   3.127e+08 ±  5%     -12.2%  2.746e+08 ±  3%  perf-stat.i.branch-instructions
>>       10683           -49.8%       5364 ±  6%  perf-stat.i.context-switches
>>   3.866e+08 ±  4%     -14.2%  3.318e+08 ±  3%  perf-stat.i.dTLB-loads
>>   1.949e+08 ±  3%     -14.7%  1.662e+08 ±  3%  perf-stat.i.dTLB-stores
>>   1.538e+09 ±  5%     -12.4%  1.348e+09 ±  3%  perf-stat.i.instructions
>>        9.31 ±  4%     -13.6%       8.04 ±  3%  perf-stat.i.metric.M/sec
>>      127426 ± 14%     -32.6%      85864 ±  9%  perf-stat.i.node-load-misses
>>   3.078e+08 ±  5%     -12.2%  2.702e+08 ±  3%  perf-stat.ps.branch-instructions
>>       10512           -49.8%       5279 ±  6%  perf-stat.ps.context-switches
>>   3.805e+08 ±  4%     -14.2%  3.265e+08 ±  3%  perf-stat.ps.dTLB-loads
>>   1.918e+08 ±  3%     -14.7%  1.635e+08 ±  3%  perf-stat.ps.dTLB-stores
>>   1.514e+09 ±  5%     -12.4%  1.327e+09 ±  3%  perf-stat.ps.instructions
>>      125399 ± 14%     -32.6%      84480 ±  9%  perf-stat.ps.node-load-misses
>>   9.563e+10 ±  6%     -12.3%  8.383e+10 ±  3%  perf-stat.total.instructions
>>        8.31 ± 20%      -3.9        4.45 ± 25%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>>        8.30 ± 20%      -3.9        4.45 ± 25%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        2.35 ± 19%      -1.6        0.75 ± 25%  perf-profile.calltrace.cycles-pp.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        2.35 ± 19%      -1.6        0.75 ± 25%  perf-profile.calltrace.cycles-pp.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        2.08 ± 19%      -1.6        0.50 ± 72%  perf-profile.calltrace.cycles-pp.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        1.85 ± 25%      -0.7        1.14 ± 31%  perf-profile.calltrace.cycles-pp.do_mknodat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        1.04 ± 27%      -0.7        0.36 ±106%  perf-profile.calltrace.cycles-pp.xfs_inactive_ifree.xfs_inactive.xfs_fs_destroy_inode.destroy_inode.do_unlinkat
>>        1.08 ± 27%      -0.6        0.46 ± 77%  perf-profile.calltrace.cycles-pp.destroy_inode.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        1.05 ± 27%      -0.6        0.45 ± 77%  perf-profile.calltrace.cycles-pp.xfs_fs_destroy_inode.destroy_inode.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        1.04 ± 27%      -0.6        0.44 ± 78%  perf-profile.calltrace.cycles-pp.xfs_inactive.xfs_fs_destroy_inode.destroy_inode.do_unlinkat.do_syscall_64
>>        0.89 ± 17%      -0.3        0.58 ±  9%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
>>        1.21 ±  8%      +0.2        1.45 ±  7%  perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
>>        8.13 ±  9%      +2.0       10.16 ±  6%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>>        2.35 ± 19%      -1.6        0.75 ± 25%  perf-profile.children.cycles-pp.__x64_sys_sync
>>        2.35 ± 19%      -1.6        0.75 ± 25%  perf-profile.children.cycles-pp.ksys_sync
>>        2.09 ± 19%      -1.4        0.65 ± 25%  perf-profile.children.cycles-pp.iterate_supers
>>        2.25 ± 18%      -1.1        1.14 ± 26%  perf-profile.children.cycles-pp._raw_spin_lock
>>        1.40 ± 25%      -1.1        0.31 ± 54%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>>        1.85 ± 25%      -0.7        1.14 ± 31%  perf-profile.children.cycles-pp.do_mknodat
>>        1.08 ± 27%      -0.5        0.55 ± 43%  perf-profile.children.cycles-pp.destroy_inode
>>        1.05 ± 27%      -0.5        0.54 ± 42%  perf-profile.children.cycles-pp.xfs_fs_destroy_inode
>>        1.04 ± 27%      -0.5        0.54 ± 43%  perf-profile.children.cycles-pp.xfs_inactive
>>        1.04 ± 27%      -0.5        0.53 ± 44%  perf-profile.children.cycles-pp.xfs_inactive_ifree
>>        0.93 ± 21%      -0.5        0.44 ± 24%  perf-profile.children.cycles-pp.__schedule
>>        0.49 ± 25%      -0.4        0.11 ± 76%  perf-profile.children.cycles-pp.pick_next_task_fair
>>        1.05 ± 20%      -0.4        0.69 ± 13%  perf-profile.children.cycles-pp.find_busiest_group
>>        0.66 ± 19%      -0.3        0.32 ± 30%  perf-profile.children.cycles-pp.schedule
>>        1.00 ± 20%      -0.3        0.65 ± 13%  perf-profile.children.cycles-pp.update_sd_lb_stats
>>        0.60 ± 20%      -0.3        0.29 ± 31%  perf-profile.children.cycles-pp.xfs_ifree
>>        0.89 ± 17%      -0.3        0.58 ±  9%  perf-profile.children.cycles-pp.worker_thread
>>        0.44 ± 26%      -0.3        0.16 ± 45%  perf-profile.children.cycles-pp.newidle_balance
>>        0.49 ± 39%      -0.3        0.22 ± 35%  perf-profile.children.cycles-pp.xfs_fs_sync_fs
>>        0.40 ± 49%      -0.2        0.17 ± 29%  perf-profile.children.cycles-pp.xfs_log_force
>>        0.31 ± 29%      -0.2        0.11 ± 56%  perf-profile.children.cycles-pp.schedule_timeout
>>        0.24 ± 60%      -0.2        0.06 ± 50%  perf-profile.children.cycles-pp.xfs_inobt_get_rec
>>        0.38 ± 19%      -0.2        0.21 ± 30%  perf-profile.children.cycles-pp.xfs_btree_lookup
>>        0.42 ± 14%      -0.2        0.26 ± 35%  perf-profile.children.cycles-pp.xfs_check_agi_freecount
>>        0.30 ± 27%      -0.2        0.14 ± 37%  perf-profile.children.cycles-pp.xfs_btree_lookup_get_block
>>        0.25 ± 30%      -0.1        0.10 ± 39%  perf-profile.children.cycles-pp.iterate_bdevs
>>        0.33 ± 28%      -0.1        0.19 ± 20%  perf-profile.children.cycles-pp.try_to_wake_up
>>        0.26 ± 31%      -0.1        0.13 ± 23%  perf-profile.children.cycles-pp.schedule_idle
>>        0.16 ± 36%      -0.1        0.03 ±102%  perf-profile.children.cycles-pp.__down
>>        0.16 ± 35%      -0.1        0.04 ± 73%  perf-profile.children.cycles-pp.xfs_buf_lock
>>        0.16 ± 36%      -0.1        0.04 ± 73%  perf-profile.children.cycles-pp.down
>>        0.21 ± 39%      -0.1        0.09 ± 52%  perf-profile.children.cycles-pp.xfs_iunlink_remove
>>        0.19 ± 28%      -0.1        0.08 ± 43%  perf-profile.children.cycles-pp.xfs_buf_item_release
>>        0.21 ± 33%      -0.1        0.10 ± 35%  perf-profile.children.cycles-pp.xfs_difree_inobt
>>        0.14 ± 16%      -0.1        0.04 ±104%  perf-profile.children.cycles-pp._xfs_trans_bjoin
>>        0.18 ± 30%      -0.1        0.08 ± 80%  perf-profile.children.cycles-pp.wait_for_completion
>>        0.15 ± 27%      -0.1        0.06 ± 53%  perf-profile.children.cycles-pp.up
>>        0.15 ± 33%      -0.1        0.06 ± 81%  perf-profile.children.cycles-pp.xfs_difree_finobt
>>        0.15 ± 27%      -0.1        0.08 ± 19%  perf-profile.children.cycles-pp.xfs_trans_alloc
>>        0.10 ± 29%      -0.1        0.04 ± 72%  perf-profile.children.cycles-pp.xfs_log_reserve
>>        0.04 ± 77%      +0.1        0.10 ± 34%  perf-profile.children.cycles-pp.balance_fair
>>        0.05 ± 79%      +0.1        0.15 ± 31%  perf-profile.children.cycles-pp.smpboot_thread_fn
>>        0.58 ± 16%      +0.1        0.72 ± 11%  perf-profile.children.cycles-pp.update_rq_clock
>>        1.24 ±  8%      +0.3        1.52 ±  4%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>>        8.31 ±  9%      +2.1       10.40 ±  5%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>>        1.30 ± 18%      -1.0        0.30 ± 54%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>        0.75 ± 21%      -0.3        0.44 ± 20%  perf-profile.self.cycles-pp.__percpu_counter_sum
>>        0.43 ± 25%      -0.3        0.16 ± 45%  perf-profile.self.cycles-pp.down_read
>>        0.72 ± 13%      -0.2        0.47 ± 12%  perf-profile.self.cycles-pp.update_sd_lb_stats
>>        0.35 ± 15%      +0.1        0.43 ± 13%  perf-profile.self.cycles-pp.irqtime_account_irq
>>        0.08 ± 60%      +0.1        0.16 ± 18%  perf-profile.self.cycles-pp.timerqueue_del
>>
>>
>>                                                                                                              stress-ng.time.system_time
>>                                                                                     20 +----------------------------------------------------------------------+
>>       |            +                                                         |
>>    18 |-+     +   ::                                       .+                |
>>       |      : :  ::                   +..               .+  :               |
>>    16 |-+    : : :  :                 :       +.    +   +    :     +         |
>>       |:    +   ::  :                 :   +  :  +  : : :      :  .. +        |
>>    14 |:+  +    +   :         .+.   .+     + :   + : : :      +.+    +.   .+.|
>>       | +.+         :  +.+.+.+   +.+        +     +   +                +.+   |
>>    12 |-+           :  :                                                     |
>>       |              ::                                                      |
>>    10 |-+            ::                                                      |
>>       |              +                                                       |
>>     8 |-+                    O   O   O      O   O       O O                  |
>>       | O O     O    O     O   O       O      O   O O       O                |
>>     6 +----------------------------------------------------------------------+
>>                                                                                                                                                                                                                                                                       stress-ng.time.percent_of_cpu_this_job_got
>>                                                                                     35 +----------------------------------------------------------------------+
>>       |                                                                      |
>>       |       +    +                                        +                |
>>    30 |-+     ::  ::                                       +:                |
>>       |      : :  ::                   +..    +     +   +.+  :     +         |
>>       |:     :  ::  :                 :      : +   : : :     :   .. :        |
>>    25 |:+  .+   +   :                 :   +. :  +. : : :      :.+   :        |
>>       | +.+         :  +.+.+.+.+.+.+.+      +     +   +       +      +.+.+.+.|
>>    20 |-+           :  :                                                     |
>>       |             : :                                                      |
>>       |              ::                                                      |
>>    15 |-+            :                                                       |
>>       |              +       O   O   O      O   O       O O                  |
>>       | O O O O O  O O O O O   O   O   O  O   O   O O       O                |
>>    10 +----------------------------------------------------------------------+
>>                                                                                                                                                                                                                                                                          stress-ng.time.voluntary_context_switches
>>                                                                                     260000 +------------------------------------------------------------------+
>>    240000 |-+     +                                                          |
>>           |      + :  +.                  +                 .+. .+.          |
>>    220000 |++.+.+  : +  +.   .+.         + + +.+. .+.+   +.+   +   +. .+. .+.|
>>    200000 |-+       +     +.+   +. .+. .+   +    +    + +            +   +   |
>>           |                       +   +                +                     |
>>    180000 |-+                                                                |
>>    160000 |-+                                                                |
>>    140000 |-+                                                                |
>>           |                                                                  |
>>    120000 |-+           O                                                    |
>>    100000 |-O   O         O O                          O                     |
>>           |       O   O                   O    O   O       O   O             |
>>     80000 |-+ O               O O O   O O   OO   O   O   O   O               |
>>     60000 +------------------------------------------------------------------+
>>                                                                                                                                                                                                                                                                                     stress-ng.mknod.ops
>>                                                                                     95000 +-------------------------------------------------------------------+
>>    90000 |-+    .+. .+.                                     .+. .+.          |
>>          |.+.+.+   +   +.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+   +   +. .+.+.+.|
>>    85000 |-+                                                         +       |
>>    80000 |-+                                                                 |
>>          |                                                                   |
>>    75000 |-+                                                                 |
>>    70000 |-+                                                                 |
>>    65000 |-+                                                                 |
>>          |                                                                   |
>>    60000 |-+                                                                 |
>>    55000 |-+                                                                 |
>>          |                                                                   |
>>    50000 |-O O O O O O O O O O O O O O O O O O O O O O O O O   O O           |
>>    45000 +-------------------------------------------------------------------+
>>                                                                                                                                                                                                                                                                                stress-ng.mknod.ops_per_sec
>>                                                                                     1600 +--------------------------------------------------------------------+
>>         |      .+.                                          .+. .+           |
>>    1500 |.+.+.+   +.+.+.+.+.+. .+.+.+.+.+..+.+.+.+.+.+.+.+.+   +  +   .+.+.+.|
>>    1400 |-+                   +                                    +.+       |
>>         |                                                                    |
>>    1300 |-+                                                                  |
>>    1200 |-+                                                                  |
>>         |                                                                    |
>>    1100 |-+                                                                  |
>>    1000 |-+                                                                  |
>>         |                                                                    |
>>     900 |-+                                                                  |
>>     800 |-+   O O   O O O     O O       O      O O     O O O     O           |
>>         | O O     O       O O     O O O    O O     O O       O O             |
>>     700 +--------------------------------------------------------------------+
>>                                                                                                                                                                   [*] bisect-good sample
>> [O] bisect-bad  sample
>>
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>>
>>
>> ---
>> 0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
>> https://lists.01.org/hyperkitty/list/lkp(a)lists.01.org       Intel Corporation
>>
>> Thanks,
>> Oliver Sang
>>
> 

WARNING: multiple messages have this Message-ID (diff)
From: Yin Fengwei <fengwei.yin@intel.com>
To: Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	kernel test robot <oliver.sang@intel.com>,
	Dave Chinner <dchinner@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	LKML <linux-kernel@vger.kernel.org>, <lkp@lists.01.org>,
	<lkp@intel.com>
Subject: Re: [LKP] Re: [xfs] 2bf1ec0ff0: stress-ng.mknod.ops_per_sec -45.4% regression
Date: Thu, 9 Sep 2021 15:07:35 +0800	[thread overview]
Message-ID: <ec35eb2e-a488-ecfe-22c6-bf86d7eef145@intel.com> (raw)
In-Reply-To: <a149d14e-48e0-bad9-4283-e560f03a941f@linux.intel.com>

Hi Dave,

On 9/3/2021 3:43 PM, Xing Zhengjun wrote:
> Hi Dave,
> 
>    Do you have time to look at this? Thanks.
> 
> On 8/4/2021 10:38 PM, kernel test robot wrote:
>>
>>
>> Greeting,
>>
>> FYI, we noticed a -45.4% regression of stress-ng.mknod.ops_per_sec due to commit:
>>
>>
>> commit: 2bf1ec0ff067ff8f692d261b29c713f3583f7e2a ("xfs: log forces imply data device cache flushes")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

We noticed the commit 2bf1ec0ff067ff8f692d261b29c713f3583f7e2a is actually a
fix to the race introduced by commit eef983ffeae7a1cdde8c3338155ae2dd74b8621b.

Then we did a test for stress-ng.mknod.ops_per_sec between commit:
  eef983ffeae7a1cdde8c3338155ae2dd74b8621b
and its parent commit:
  3468bb1ca6e8840789e13c7b9d8b0c556b4fbe79

We got:
commit: 
  3468bb1ca6e8840789e13c7b9d8b0c556b4fbe79
  eef983ffeae7a1cdde8c3338155ae2dd74b8621b

3468bb1ca6e88407 eef983ffeae7a1cdde8c3338155 
---------------- --------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
        112:1          456%         117:1     dmesg.timestamp:last
        103:1          553%         109:1     kmsg.timestamp:last
          1:1            0%           1:1     pmeter.pmeter.fail
          1:1            0%           1:1     stress-ng.mknod.pass
         %stddev     %change         %stddev
             \          |                \  
     47979           +80.6%      86655        stress-ng.mknod.ops
    799.29           +80.6%       1443        stress-ng.mknod.ops_per_sec

It's like commit eef983ffeae7a1cdde8c3338155ae2dd74b8621b bought 80%
performance gain. But it introduced race issue.

Commit 2bf1ec0ff067ff8f692d261b29c713f3583f7e2a fixed the race issue
and lost about 50% performance gain. Suppose we still have 30%
performance gain with race issue fixed. So it's still a good change.

We will close this report.


Regards
Yin, Fengwei

>>
>>
>> in testcase: stress-ng
>> on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
>> with following parameters:
>>
>>     nr_threads: 10%
>>     disk: 1HDD
>>     testtime: 60s
>>     fs: xfs
>>     class: filesystem
>>     test: mknod
>>     cpufreq_governor: performance
>>     ucode: 0x5003006
>>
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> To reproduce:
>>
>>          git clone https://github.com/intel/lkp-tests.git
>>          cd lkp-tests
>>          bin/lkp install                job.yaml  # job file is attached in this email
>>          bin/lkp split-job --compatible job.yaml  # generate the yaml file for lkp run
>>          bin/lkp run                    generated-yaml-file
>>
>> =========================================================================================
>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
>>    filesystem/gcc-9/performance/1HDD/xfs/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/mknod/stress-ng/60s/0x5003006
>>
>> commit:
>>    45eddb4140 ("xfs: factor out forced iclog flushes")
>>    2bf1ec0ff0 ("xfs: log forces imply data device cache flushes")
>>
>> 45eddb414047c366 2bf1ec0ff067ff8f692d261b29c
>> ---------------- ---------------------------
>>           %stddev     %change         %stddev
>>               \          |                \
>>       87135           -45.4%      47612        stress-ng.mknod.ops
>>        1451           -45.4%     793.24        stress-ng.mknod.ops_per_sec
>>       23.00 ±  7%     -51.4%      11.17 ± 12%  stress-ng.time.percent_of_cpu_this_job_got
>>       13.80 ±  7%     -50.7%       6.80 ±  8%  stress-ng.time.system_time
>>      211458 ±  2%     -62.4%      79605 ± 14%  stress-ng.time.voluntary_context_switches
>>        0.30 ±  2%      -0.1        0.18        mpstat.cpu.all.sys%
>>       13993 ±  2%     -38.1%       8659        softirqs.BLOCK
>>       27747            -2.3%      27103        proc-vmstat.nr_slab_reclaimable
>>       76223 ±  3%     -57.3%      32548 ±  5%  proc-vmstat.pgpgout
>>        1170 ±  3%     -57.3%     500.17 ±  5%  vmstat.io.bo
>>       10752           -47.7%       5624 ±  6%  vmstat.system.cs
>>       32820           -34.5%      21499        slabinfo.kmalloc-rcl-64.active_objs
>>      512.67           -34.6%     335.50        slabinfo.kmalloc-rcl-64.active_slabs
>>       32836           -34.5%      21508        slabinfo.kmalloc-rcl-64.num_objs
>>      512.67           -34.6%     335.50        slabinfo.kmalloc-rcl-64.num_slabs
>>       24913 ±  2%     -42.8%      14251        interrupts.315:PCI-MSI.376832-edge.ahci[0000:00:17.0]
>>       71844 ±  6%     -19.9%      57564 ±  3%  interrupts.CAL:Function_call_interrupts
>>      206.50 ± 37%     -42.9%     118.00 ± 41%  interrupts.CPU1.NMI:Non-maskable_interrupts
>>      206.50 ± 37%     -42.9%     118.00 ± 41%  interrupts.CPU1.PMI:Performance_monitoring_interrupts
>>      173.17 ± 26%     -31.7%     118.33 ± 38%  interrupts.CPU2.NMI:Non-maskable_interrupts
>>      173.17 ± 26%     -31.7%     118.33 ± 38%  interrupts.CPU2.PMI:Performance_monitoring_interrupts
>>      189.83 ± 34%     -40.6%     112.83 ± 25%  interrupts.CPU49.NMI:Non-maskable_interrupts
>>      189.83 ± 34%     -40.6%     112.83 ± 25%  interrupts.CPU49.PMI:Performance_monitoring_interrupts
>>      181.67 ± 25%     -38.3%     112.17 ± 24%  interrupts.CPU50.NMI:Non-maskable_interrupts
>>      181.67 ± 25%     -38.3%     112.17 ± 24%  interrupts.CPU50.PMI:Performance_monitoring_interrupts
>>        1895 ± 16%     -32.8%       1274 ± 20%  interrupts.RES:Rescheduling_interrupts
>>   3.127e+08 ±  5%     -12.2%  2.746e+08 ±  3%  perf-stat.i.branch-instructions
>>       10683           -49.8%       5364 ±  6%  perf-stat.i.context-switches
>>   3.866e+08 ±  4%     -14.2%  3.318e+08 ±  3%  perf-stat.i.dTLB-loads
>>   1.949e+08 ±  3%     -14.7%  1.662e+08 ±  3%  perf-stat.i.dTLB-stores
>>   1.538e+09 ±  5%     -12.4%  1.348e+09 ±  3%  perf-stat.i.instructions
>>        9.31 ±  4%     -13.6%       8.04 ±  3%  perf-stat.i.metric.M/sec
>>      127426 ± 14%     -32.6%      85864 ±  9%  perf-stat.i.node-load-misses
>>   3.078e+08 ±  5%     -12.2%  2.702e+08 ±  3%  perf-stat.ps.branch-instructions
>>       10512           -49.8%       5279 ±  6%  perf-stat.ps.context-switches
>>   3.805e+08 ±  4%     -14.2%  3.265e+08 ±  3%  perf-stat.ps.dTLB-loads
>>   1.918e+08 ±  3%     -14.7%  1.635e+08 ±  3%  perf-stat.ps.dTLB-stores
>>   1.514e+09 ±  5%     -12.4%  1.327e+09 ±  3%  perf-stat.ps.instructions
>>      125399 ± 14%     -32.6%      84480 ±  9%  perf-stat.ps.node-load-misses
>>   9.563e+10 ±  6%     -12.3%  8.383e+10 ±  3%  perf-stat.total.instructions
>>        8.31 ± 20%      -3.9        4.45 ± 25%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>>        8.30 ± 20%      -3.9        4.45 ± 25%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        2.35 ± 19%      -1.6        0.75 ± 25%  perf-profile.calltrace.cycles-pp.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        2.35 ± 19%      -1.6        0.75 ± 25%  perf-profile.calltrace.cycles-pp.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        2.08 ± 19%      -1.6        0.50 ± 72%  perf-profile.calltrace.cycles-pp.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        1.85 ± 25%      -0.7        1.14 ± 31%  perf-profile.calltrace.cycles-pp.do_mknodat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        1.04 ± 27%      -0.7        0.36 ±106%  perf-profile.calltrace.cycles-pp.xfs_inactive_ifree.xfs_inactive.xfs_fs_destroy_inode.destroy_inode.do_unlinkat
>>        1.08 ± 27%      -0.6        0.46 ± 77%  perf-profile.calltrace.cycles-pp.destroy_inode.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        1.05 ± 27%      -0.6        0.45 ± 77%  perf-profile.calltrace.cycles-pp.xfs_fs_destroy_inode.destroy_inode.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>        1.04 ± 27%      -0.6        0.44 ± 78%  perf-profile.calltrace.cycles-pp.xfs_inactive.xfs_fs_destroy_inode.destroy_inode.do_unlinkat.do_syscall_64
>>        0.89 ± 17%      -0.3        0.58 ±  9%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
>>        1.21 ±  8%      +0.2        1.45 ±  7%  perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
>>        8.13 ±  9%      +2.0       10.16 ±  6%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>>        2.35 ± 19%      -1.6        0.75 ± 25%  perf-profile.children.cycles-pp.__x64_sys_sync
>>        2.35 ± 19%      -1.6        0.75 ± 25%  perf-profile.children.cycles-pp.ksys_sync
>>        2.09 ± 19%      -1.4        0.65 ± 25%  perf-profile.children.cycles-pp.iterate_supers
>>        2.25 ± 18%      -1.1        1.14 ± 26%  perf-profile.children.cycles-pp._raw_spin_lock
>>        1.40 ± 25%      -1.1        0.31 ± 54%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>>        1.85 ± 25%      -0.7        1.14 ± 31%  perf-profile.children.cycles-pp.do_mknodat
>>        1.08 ± 27%      -0.5        0.55 ± 43%  perf-profile.children.cycles-pp.destroy_inode
>>        1.05 ± 27%      -0.5        0.54 ± 42%  perf-profile.children.cycles-pp.xfs_fs_destroy_inode
>>        1.04 ± 27%      -0.5        0.54 ± 43%  perf-profile.children.cycles-pp.xfs_inactive
>>        1.04 ± 27%      -0.5        0.53 ± 44%  perf-profile.children.cycles-pp.xfs_inactive_ifree
>>        0.93 ± 21%      -0.5        0.44 ± 24%  perf-profile.children.cycles-pp.__schedule
>>        0.49 ± 25%      -0.4        0.11 ± 76%  perf-profile.children.cycles-pp.pick_next_task_fair
>>        1.05 ± 20%      -0.4        0.69 ± 13%  perf-profile.children.cycles-pp.find_busiest_group
>>        0.66 ± 19%      -0.3        0.32 ± 30%  perf-profile.children.cycles-pp.schedule
>>        1.00 ± 20%      -0.3        0.65 ± 13%  perf-profile.children.cycles-pp.update_sd_lb_stats
>>        0.60 ± 20%      -0.3        0.29 ± 31%  perf-profile.children.cycles-pp.xfs_ifree
>>        0.89 ± 17%      -0.3        0.58 ±  9%  perf-profile.children.cycles-pp.worker_thread
>>        0.44 ± 26%      -0.3        0.16 ± 45%  perf-profile.children.cycles-pp.newidle_balance
>>        0.49 ± 39%      -0.3        0.22 ± 35%  perf-profile.children.cycles-pp.xfs_fs_sync_fs
>>        0.40 ± 49%      -0.2        0.17 ± 29%  perf-profile.children.cycles-pp.xfs_log_force
>>        0.31 ± 29%      -0.2        0.11 ± 56%  perf-profile.children.cycles-pp.schedule_timeout
>>        0.24 ± 60%      -0.2        0.06 ± 50%  perf-profile.children.cycles-pp.xfs_inobt_get_rec
>>        0.38 ± 19%      -0.2        0.21 ± 30%  perf-profile.children.cycles-pp.xfs_btree_lookup
>>        0.42 ± 14%      -0.2        0.26 ± 35%  perf-profile.children.cycles-pp.xfs_check_agi_freecount
>>        0.30 ± 27%      -0.2        0.14 ± 37%  perf-profile.children.cycles-pp.xfs_btree_lookup_get_block
>>        0.25 ± 30%      -0.1        0.10 ± 39%  perf-profile.children.cycles-pp.iterate_bdevs
>>        0.33 ± 28%      -0.1        0.19 ± 20%  perf-profile.children.cycles-pp.try_to_wake_up
>>        0.26 ± 31%      -0.1        0.13 ± 23%  perf-profile.children.cycles-pp.schedule_idle
>>        0.16 ± 36%      -0.1        0.03 ±102%  perf-profile.children.cycles-pp.__down
>>        0.16 ± 35%      -0.1        0.04 ± 73%  perf-profile.children.cycles-pp.xfs_buf_lock
>>        0.16 ± 36%      -0.1        0.04 ± 73%  perf-profile.children.cycles-pp.down
>>        0.21 ± 39%      -0.1        0.09 ± 52%  perf-profile.children.cycles-pp.xfs_iunlink_remove
>>        0.19 ± 28%      -0.1        0.08 ± 43%  perf-profile.children.cycles-pp.xfs_buf_item_release
>>        0.21 ± 33%      -0.1        0.10 ± 35%  perf-profile.children.cycles-pp.xfs_difree_inobt
>>        0.14 ± 16%      -0.1        0.04 ±104%  perf-profile.children.cycles-pp._xfs_trans_bjoin
>>        0.18 ± 30%      -0.1        0.08 ± 80%  perf-profile.children.cycles-pp.wait_for_completion
>>        0.15 ± 27%      -0.1        0.06 ± 53%  perf-profile.children.cycles-pp.up
>>        0.15 ± 33%      -0.1        0.06 ± 81%  perf-profile.children.cycles-pp.xfs_difree_finobt
>>        0.15 ± 27%      -0.1        0.08 ± 19%  perf-profile.children.cycles-pp.xfs_trans_alloc
>>        0.10 ± 29%      -0.1        0.04 ± 72%  perf-profile.children.cycles-pp.xfs_log_reserve
>>        0.04 ± 77%      +0.1        0.10 ± 34%  perf-profile.children.cycles-pp.balance_fair
>>        0.05 ± 79%      +0.1        0.15 ± 31%  perf-profile.children.cycles-pp.smpboot_thread_fn
>>        0.58 ± 16%      +0.1        0.72 ± 11%  perf-profile.children.cycles-pp.update_rq_clock
>>        1.24 ±  8%      +0.3        1.52 ±  4%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>>        8.31 ±  9%      +2.1       10.40 ±  5%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>>        1.30 ± 18%      -1.0        0.30 ± 54%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>        0.75 ± 21%      -0.3        0.44 ± 20%  perf-profile.self.cycles-pp.__percpu_counter_sum
>>        0.43 ± 25%      -0.3        0.16 ± 45%  perf-profile.self.cycles-pp.down_read
>>        0.72 ± 13%      -0.2        0.47 ± 12%  perf-profile.self.cycles-pp.update_sd_lb_stats
>>        0.35 ± 15%      +0.1        0.43 ± 13%  perf-profile.self.cycles-pp.irqtime_account_irq
>>        0.08 ± 60%      +0.1        0.16 ± 18%  perf-profile.self.cycles-pp.timerqueue_del
>>
>>
>>                                                                                                              stress-ng.time.system_time
>>                                                                                     20 +----------------------------------------------------------------------+
>>       |            +                                                         |
>>    18 |-+     +   ::                                       .+                |
>>       |      : :  ::                   +..               .+  :               |
>>    16 |-+    : : :  :                 :       +.    +   +    :     +         |
>>       |:    +   ::  :                 :   +  :  +  : : :      :  .. +        |
>>    14 |:+  +    +   :         .+.   .+     + :   + : : :      +.+    +.   .+.|
>>       | +.+         :  +.+.+.+   +.+        +     +   +                +.+   |
>>    12 |-+           :  :                                                     |
>>       |              ::                                                      |
>>    10 |-+            ::                                                      |
>>       |              +                                                       |
>>     8 |-+                    O   O   O      O   O       O O                  |
>>       | O O     O    O     O   O       O      O   O O       O                |
>>     6 +----------------------------------------------------------------------+
>>                                                                                                                                                                                                                                                                       stress-ng.time.percent_of_cpu_this_job_got
>>                                                                                     35 +----------------------------------------------------------------------+
>>       |                                                                      |
>>       |       +    +                                        +                |
>>    30 |-+     ::  ::                                       +:                |
>>       |      : :  ::                   +..    +     +   +.+  :     +         |
>>       |:     :  ::  :                 :      : +   : : :     :   .. :        |
>>    25 |:+  .+   +   :                 :   +. :  +. : : :      :.+   :        |
>>       | +.+         :  +.+.+.+.+.+.+.+      +     +   +       +      +.+.+.+.|
>>    20 |-+           :  :                                                     |
>>       |             : :                                                      |
>>       |              ::                                                      |
>>    15 |-+            :                                                       |
>>       |              +       O   O   O      O   O       O O                  |
>>       | O O O O O  O O O O O   O   O   O  O   O   O O       O                |
>>    10 +----------------------------------------------------------------------+
>>                                                                                                                                                                                                                                                                          stress-ng.time.voluntary_context_switches
>>                                                                                     260000 +------------------------------------------------------------------+
>>    240000 |-+     +                                                          |
>>           |      + :  +.                  +                 .+. .+.          |
>>    220000 |++.+.+  : +  +.   .+.         + + +.+. .+.+   +.+   +   +. .+. .+.|
>>    200000 |-+       +     +.+   +. .+. .+   +    +    + +            +   +   |
>>           |                       +   +                +                     |
>>    180000 |-+                                                                |
>>    160000 |-+                                                                |
>>    140000 |-+                                                                |
>>           |                                                                  |
>>    120000 |-+           O                                                    |
>>    100000 |-O   O         O O                          O                     |
>>           |       O   O                   O    O   O       O   O             |
>>     80000 |-+ O               O O O   O O   OO   O   O   O   O               |
>>     60000 +------------------------------------------------------------------+
>>                                                                                                                                                                                                                                                                                     stress-ng.mknod.ops
>>                                                                                     95000 +-------------------------------------------------------------------+
>>    90000 |-+    .+. .+.                                     .+. .+.          |
>>          |.+.+.+   +   +.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+   +   +. .+.+.+.|
>>    85000 |-+                                                         +       |
>>    80000 |-+                                                                 |
>>          |                                                                   |
>>    75000 |-+                                                                 |
>>    70000 |-+                                                                 |
>>    65000 |-+                                                                 |
>>          |                                                                   |
>>    60000 |-+                                                                 |
>>    55000 |-+                                                                 |
>>          |                                                                   |
>>    50000 |-O O O O O O O O O O O O O O O O O O O O O O O O O   O O           |
>>    45000 +-------------------------------------------------------------------+
>>                                                                                                                                                                                                                                                                                stress-ng.mknod.ops_per_sec
>>                                                                                     1600 +--------------------------------------------------------------------+
>>         |      .+.                                          .+. .+           |
>>    1500 |.+.+.+   +.+.+.+.+.+. .+.+.+.+.+..+.+.+.+.+.+.+.+.+   +  +   .+.+.+.|
>>    1400 |-+                   +                                    +.+       |
>>         |                                                                    |
>>    1300 |-+                                                                  |
>>    1200 |-+                                                                  |
>>         |                                                                    |
>>    1100 |-+                                                                  |
>>    1000 |-+                                                                  |
>>         |                                                                    |
>>     900 |-+                                                                  |
>>     800 |-+   O O   O O O     O O       O      O O     O O O     O           |
>>         | O O     O       O O     O O O    O O     O O       O O             |
>>     700 +--------------------------------------------------------------------+
>>                                                                                                                                                                   [*] bisect-good sample
>> [O] bisect-bad  sample
>>
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>>
>>
>> ---
>> 0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
>> https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation
>>
>> Thanks,
>> Oliver Sang
>>
> 

  reply	other threads:[~2021-09-09  7:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04 14:38 [xfs] 2bf1ec0ff0: stress-ng.mknod.ops_per_sec -45.4% regression kernel test robot
2021-08-04 14:38 ` kernel test robot
2021-09-03  7:43 ` Xing Zhengjun
2021-09-03  7:43   ` Xing Zhengjun
2021-09-09  7:07   ` Yin Fengwei [this message]
2021-09-09  7:07     ` [LKP] " Yin Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ec35eb2e-a488-ecfe-22c6-bf86d7eef145@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.