Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression
       [not found] <202210081045.77ddf59b-yujie.liu@intel.com>
@ 2022-10-08  8:00 ` Yu Kuai
  2022-10-09  5:47   ` [LKP] " Yin Fengwei
  2022-10-09  8:43   ` Ming Lei
  0 siblings, 2 replies; 6+ messages in thread
From: Yu Kuai @ 2022-10-08  8:00 UTC (permalink / raw)
  To: kernel test robot
  Cc: lkp, lkp, Jens Axboe, linux-kernel, linux-block, ying.huang,
	feng.tang, zhengjun.xing, fengwei.yin, yukuai (C)

Hi,

在 2022/10/08 10:50, kernel test robot 写道:
> Greeting,
> 
> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:

I don't know how this is working but I'm *sure* this commit won't affect
performance. Please take a look at the commit, only wbt initialization
is touched, which is done while creating the device:

device_add_disk
  blk_register_queue
   wbt_enable_default
    wbt_init

And io path is the same with or without this commit.

By the way, wbt should only work for write.

Thanks,
Kuai
> 
> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: fio-basic
> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
> with following parameters:
> 
> 	runtime: 300s
> 	nr_task: 8t
> 	disk: 1SSD
> 	fs: btrfs
> 	rw: randread
> 	bs: 2M
> 	ioengine: sync
> 	test_size: 256g
> 	cpufreq_governor: performance
> 
> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
> 
> 
> Details are as below:
> 
> =========================================================================================
> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
>    2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic
> 
> commit:
>    f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
>    8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
> 
> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>        0.03 ±106%      +0.2        0.22 ± 80%  fio.latency_20ms%
>        0.02 ± 33%      -0.0        0.01 ± 12%  fio.latency_4ms%
>        2508           -10.6%       2243        fio.read_bw_MBps
>     6717440           +17.6%    7897088        fio.read_clat_90%_us
>     6892202           +19.0%    8202922        fio.read_clat_95%_us
>     7602176 ±  4%     +18.4%    9000277 ±  3%  fio.read_clat_99%_us
>     6374238           +11.8%    7127450        fio.read_clat_mean_us
>      363825 ± 10%     +74.9%     636378 ±  5%  fio.read_clat_stddev
>        1254           -10.6%       1121        fio.read_iops
>      104.97           +11.8%     117.32        fio.time.elapsed_time
>      104.97           +11.8%     117.32        fio.time.elapsed_time.max
>       13731            +5.6%      14498 ±  4%  fio.time.maximum_resident_set_size
>      116.00            -8.2%     106.50        fio.time.percent_of_cpu_this_job_got
>   1.998e+10           +11.4%  2.226e+10        cpuidle..time
>        3.27 ±  3%      +4.6%       3.42        iostat.cpu.iowait
>        4.49 ± 68%      -2.1        2.38 ±152%  perf-profile.children.cycles-pp.number
>        4.49 ± 68%      -2.5        1.98 ±175%  perf-profile.self.cycles-pp.number
>      557763            +5.4%     587781        proc-vmstat.pgfault
>       25488            +3.1%      26274        proc-vmstat.pgreuse
>     2459048           -10.1%    2209482        vmstat.io.bi
>      184649 ±  5%     -10.4%     165526 ±  7%  vmstat.system.cs
>      111733 ± 30%     +61.8%     180770 ± 21%  numa-meminfo.node0.AnonPages
>      113221 ± 30%     +60.2%     181416 ± 21%  numa-meminfo.node0.Inactive(anon)
>       11301 ± 24%    +164.5%      29888 ±117%  numa-meminfo.node2.Active(file)
>      104911 ± 39%     -80.5%      20456 ±100%  numa-meminfo.node3.AnonHugePages
>      131666 ± 27%     -67.9%      42297 ± 82%  numa-meminfo.node3.AnonPages
>      132698 ± 26%     -67.5%      43158 ± 81%  numa-meminfo.node3.Inactive(anon)
>       27934 ± 30%     +61.8%      45196 ± 21%  numa-vmstat.node0.nr_anon_pages
>       28306 ± 30%     +60.2%      45358 ± 21%  numa-vmstat.node0.nr_inactive_anon
>       28305 ± 30%     +60.2%      45357 ± 21%  numa-vmstat.node0.nr_zone_inactive_anon
>        6291 ± 24%     +68.0%      10567 ± 26%  numa-vmstat.node2.workingset_nodes
>       32925 ± 27%     -67.9%      10571 ± 82%  numa-vmstat.node3.nr_anon_pages
>       33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_inactive_anon
>       33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_zone_inactive_anon
>      161.78 ±  4%     -28.2%     116.10 ± 30%  sched_debug.cfs_rq:/.runnable_avg.avg
>      161.46 ±  4%     -28.2%     115.85 ± 30%  sched_debug.cfs_rq:/.util_avg.avg
>      426382           +11.0%     473345 ±  6%  sched_debug.cpu.clock.avg
>      426394           +11.0%     473357 ±  6%  sched_debug.cpu.clock.max
>      426370           +11.0%     473331 ±  6%  sched_debug.cpu.clock.min
>      426139           +10.9%     472586 ±  6%  sched_debug.cpu.clock_task.avg
>      426368           +11.0%     473130 ±  6%  sched_debug.cpu.clock_task.max
>      416196           +11.1%     462228 ±  6%  sched_debug.cpu.clock_task.min
>        1156 ±  7%     -10.8%       1031 ±  6%  sched_debug.cpu.curr->pid.stddev
>      426372           +11.0%     473334 ±  6%  sched_debug.cpu_clk
>      425355           +11.0%     472318 ±  6%  sched_debug.ktime
>      426826           +11.0%     473787 ±  6%  sched_debug.sched_clk
>   1.263e+09            -7.9%  1.164e+09 ±  3%  perf-stat.i.branch-instructions
>      190886 ±  5%     -10.8%     170290 ±  7%  perf-stat.i.context-switches
>   1.979e+09            -8.8%  1.804e+09 ±  2%  perf-stat.i.dTLB-loads
>   8.998e+08            -8.2%  8.257e+08 ±  2%  perf-stat.i.dTLB-stores
>   6.455e+09            -8.0%  5.938e+09 ±  3%  perf-stat.i.instructions
>       21.78            -8.4%      19.95        perf-stat.i.metric.M/sec
>     7045315 ±  4%     -14.0%    6057863 ±  6%  perf-stat.i.node-load-misses
>     2658563 ±  7%     -21.9%    2077647 ± 12%  perf-stat.i.node-loads
>      414822 ±  4%     -12.9%     361455 ±  3%  perf-stat.i.node-store-misses
>   1.251e+09            -7.8%  1.154e+09 ±  3%  perf-stat.ps.branch-instructions
>      189082 ±  5%     -10.7%     168849 ±  7%  perf-stat.ps.context-switches
>    1.96e+09            -8.8%  1.789e+09 ±  2%  perf-stat.ps.dTLB-loads
>   8.912e+08            -8.1%  8.187e+08 ±  2%  perf-stat.ps.dTLB-stores
>   6.393e+09            -7.9%  5.888e+09 ±  3%  perf-stat.ps.instructions
>     6978485 ±  4%     -13.9%    6006510 ±  6%  perf-stat.ps.node-load-misses
>     2633627 ±  7%     -21.8%    2060033 ± 12%  perf-stat.ps.node-loads
>      410822 ±  4%     -12.8%     358289 ±  3%  perf-stat.ps.node-store-misses
> 
> 
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <yujie.liu@intel.com>
> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com
> 
> 
> To reproduce:
> 
>          git clone https://github.com/intel/lkp-tests.git
>          cd lkp-tests
>          sudo bin/lkp install job.yaml           # job file is attached in this email
>          bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>          sudo bin/lkp run generated-yaml-file
> 
>          # if come across any failure that blocks the test,
>          # please remove ~/.lkp and /lkp dir to run from a clean state.
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LKP] Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression
  2022-10-08  8:00 ` [blk] 8c5035dfbb: fio.read_iops -10.6% regression Yu Kuai
@ 2022-10-09  5:47   ` Yin Fengwei
  2022-10-09  6:14     ` Yu Kuai
  2022-10-09  8:43   ` Ming Lei
  1 sibling, 1 reply; 6+ messages in thread
From: Yin Fengwei @ 2022-10-09  5:47 UTC (permalink / raw)
  To: Yu Kuai, kernel test robot
  Cc: lkp, lkp, Jens Axboe, linux-kernel, linux-block, yukuai (C)

Hi Kuai,

On 10/8/22 16:00, Yu Kuai wrote:
> Hi,
> 
> 在 2022/10/08 10:50, kernel test robot 写道:
>> Greeting,
>>
>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
> 
> I don't know how this is working but I'm *sure* this commit won't affect
> performance. Please take a look at the commit, only wbt initialization
> is touched, which is done while creating the device:
> 
> device_add_disk
>  blk_register_queue
>   wbt_enable_default
>    wbt_init
> 
> And io path is the same with or without this commit.
> 
> By the way, wbt should only work for write.
Some information here:
It looks like the line
    wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
matters.

If move only this line to original position based on 8c5035dfbb,
the regression is gone.

If move only this line before ret = rq_qos_add() (just like your patch
did, but only with this line) based on 8c5035dfbb, the regression can
be reproduced.


Regards
Yin, Fengwei

> 
> Thanks,
> Kuai
>>
>> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: fio-basic
>> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
>> with following parameters:
>>
>>     runtime: 300s
>>     nr_task: 8t
>>     disk: 1SSD
>>     fs: btrfs
>>     rw: randread
>>     bs: 2M
>>     ioengine: sync
>>     test_size: 256g
>>     cpufreq_governor: performance
>>
>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
>> test-url: https://github.com/axboe/fio
>>
>>
>> Details are as below:
>>
>> =========================================================================================
>> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
>>    2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic
>>
>> commit:
>>    f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
>>    8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
>>
>> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
>> ---------------- ---------------------------
>>           %stddev     %change         %stddev
>>               \          |                \
>>        0.03 ±106%      +0.2        0.22 ± 80%  fio.latency_20ms%
>>        0.02 ± 33%      -0.0        0.01 ± 12%  fio.latency_4ms%
>>        2508           -10.6%       2243        fio.read_bw_MBps
>>     6717440           +17.6%    7897088        fio.read_clat_90%_us
>>     6892202           +19.0%    8202922        fio.read_clat_95%_us
>>     7602176 ±  4%     +18.4%    9000277 ±  3%  fio.read_clat_99%_us
>>     6374238           +11.8%    7127450        fio.read_clat_mean_us
>>      363825 ± 10%     +74.9%     636378 ±  5%  fio.read_clat_stddev
>>        1254           -10.6%       1121        fio.read_iops
>>      104.97           +11.8%     117.32        fio.time.elapsed_time
>>      104.97           +11.8%     117.32        fio.time.elapsed_time.max
>>       13731            +5.6%      14498 ±  4%  fio.time.maximum_resident_set_size
>>      116.00            -8.2%     106.50        fio.time.percent_of_cpu_this_job_got
>>   1.998e+10           +11.4%  2.226e+10        cpuidle..time
>>        3.27 ±  3%      +4.6%       3.42        iostat.cpu.iowait
>>        4.49 ± 68%      -2.1        2.38 ±152%  perf-profile.children.cycles-pp.number
>>        4.49 ± 68%      -2.5        1.98 ±175%  perf-profile.self.cycles-pp.number
>>      557763            +5.4%     587781        proc-vmstat.pgfault
>>       25488            +3.1%      26274        proc-vmstat.pgreuse
>>     2459048           -10.1%    2209482        vmstat.io.bi
>>      184649 ±  5%     -10.4%     165526 ±  7%  vmstat.system.cs
>>      111733 ± 30%     +61.8%     180770 ± 21%  numa-meminfo.node0.AnonPages
>>      113221 ± 30%     +60.2%     181416 ± 21%  numa-meminfo.node0.Inactive(anon)
>>       11301 ± 24%    +164.5%      29888 ±117%  numa-meminfo.node2.Active(file)
>>      104911 ± 39%     -80.5%      20456 ±100%  numa-meminfo.node3.AnonHugePages
>>      131666 ± 27%     -67.9%      42297 ± 82%  numa-meminfo.node3.AnonPages
>>      132698 ± 26%     -67.5%      43158 ± 81%  numa-meminfo.node3.Inactive(anon)
>>       27934 ± 30%     +61.8%      45196 ± 21%  numa-vmstat.node0.nr_anon_pages
>>       28306 ± 30%     +60.2%      45358 ± 21%  numa-vmstat.node0.nr_inactive_anon
>>       28305 ± 30%     +60.2%      45357 ± 21%  numa-vmstat.node0.nr_zone_inactive_anon
>>        6291 ± 24%     +68.0%      10567 ± 26%  numa-vmstat.node2.workingset_nodes
>>       32925 ± 27%     -67.9%      10571 ± 82%  numa-vmstat.node3.nr_anon_pages
>>       33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_inactive_anon
>>       33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_zone_inactive_anon
>>      161.78 ±  4%     -28.2%     116.10 ± 30%  sched_debug.cfs_rq:/.runnable_avg.avg
>>      161.46 ±  4%     -28.2%     115.85 ± 30%  sched_debug.cfs_rq:/.util_avg.avg
>>      426382           +11.0%     473345 ±  6%  sched_debug.cpu.clock.avg
>>      426394           +11.0%     473357 ±  6%  sched_debug.cpu.clock.max
>>      426370           +11.0%     473331 ±  6%  sched_debug.cpu.clock.min
>>      426139           +10.9%     472586 ±  6%  sched_debug.cpu.clock_task.avg
>>      426368           +11.0%     473130 ±  6%  sched_debug.cpu.clock_task.max
>>      416196           +11.1%     462228 ±  6%  sched_debug.cpu.clock_task.min
>>        1156 ±  7%     -10.8%       1031 ±  6%  sched_debug.cpu.curr->pid.stddev
>>      426372           +11.0%     473334 ±  6%  sched_debug.cpu_clk
>>      425355           +11.0%     472318 ±  6%  sched_debug.ktime
>>      426826           +11.0%     473787 ±  6%  sched_debug.sched_clk
>>   1.263e+09            -7.9%  1.164e+09 ±  3%  perf-stat.i.branch-instructions
>>      190886 ±  5%     -10.8%     170290 ±  7%  perf-stat.i.context-switches
>>   1.979e+09            -8.8%  1.804e+09 ±  2%  perf-stat.i.dTLB-loads
>>   8.998e+08            -8.2%  8.257e+08 ±  2%  perf-stat.i.dTLB-stores
>>   6.455e+09            -8.0%  5.938e+09 ±  3%  perf-stat.i.instructions
>>       21.78            -8.4%      19.95        perf-stat.i.metric.M/sec
>>     7045315 ±  4%     -14.0%    6057863 ±  6%  perf-stat.i.node-load-misses
>>     2658563 ±  7%     -21.9%    2077647 ± 12%  perf-stat.i.node-loads
>>      414822 ±  4%     -12.9%     361455 ±  3%  perf-stat.i.node-store-misses
>>   1.251e+09            -7.8%  1.154e+09 ±  3%  perf-stat.ps.branch-instructions
>>      189082 ±  5%     -10.7%     168849 ±  7%  perf-stat.ps.context-switches
>>    1.96e+09            -8.8%  1.789e+09 ±  2%  perf-stat.ps.dTLB-loads
>>   8.912e+08            -8.1%  8.187e+08 ±  2%  perf-stat.ps.dTLB-stores
>>   6.393e+09            -7.9%  5.888e+09 ±  3%  perf-stat.ps.instructions
>>     6978485 ±  4%     -13.9%    6006510 ±  6%  perf-stat.ps.node-load-misses
>>     2633627 ±  7%     -21.8%    2060033 ± 12%  perf-stat.ps.node-loads
>>      410822 ±  4%     -12.8%     358289 ±  3%  perf-stat.ps.node-store-misses
>>
>>
>> If you fix the issue, kindly add following tag
>> | Reported-by: kernel test robot <yujie.liu@intel.com>
>> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com
>>
>>
>> To reproduce:
>>
>>          git clone https://github.com/intel/lkp-tests.git
>>          cd lkp-tests
>>          sudo bin/lkp install job.yaml           # job file is attached in this email
>>          bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>          sudo bin/lkp run generated-yaml-file
>>
>>          # if come across any failure that blocks the test,
>>          # please remove ~/.lkp and /lkp dir to run from a clean state.
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>>
>>
> _______________________________________________
> LKP mailing list -- lkp@lists.01.org
> To unsubscribe send an email to lkp-leave@lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LKP] Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression
  2022-10-09  5:47   ` [LKP] " Yin Fengwei
@ 2022-10-09  6:14     ` Yu Kuai
  0 siblings, 0 replies; 6+ messages in thread
From: Yu Kuai @ 2022-10-09  6:14 UTC (permalink / raw)
  To: Yin Fengwei, Yu Kuai, kernel test robot
  Cc: lkp, lkp, Jens Axboe, linux-kernel, linux-block, yukuai (C)

Hi,

在 2022/10/09 13:47, Yin Fengwei 写道:
> Hi Kuai,
> 
> On 10/8/22 16:00, Yu Kuai wrote:
>> Hi,
>>
>> 在 2022/10/08 10:50, kernel test robot 写道:
>>> Greeting,
>>>
>>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
>>
>> I don't know how this is working but I'm *sure* this commit won't affect
>> performance. Please take a look at the commit, only wbt initialization
>> is touched, which is done while creating the device:
>>
>> device_add_disk
>>   blk_register_queue
>>    wbt_enable_default
>>     wbt_init
>>
>> And io path is the same with or without this commit.
>>
>> By the way, wbt should only work for write.
> Some information here:
> It looks like the line
>      wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
> matters.
> 
> If move only this line to original position based on 8c5035dfbb,
> the regression is gone.
> 
> If move only this line before ret = rq_qos_add() (just like your patch
> did, but only with this line) based on 8c5035dfbb, the regression can
> be reproduced.
> 

Thanks for the information, but I still don't understand if there is any
difference after wbt_init() is done, and how does read is afftected by
wbt. 🙁
> 
> Regards
> Yin, Fengwei
> 
>>
>> Thanks,
>> Kuai
>>>
>>> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> in testcase: fio-basic
>>> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
>>> with following parameters:
>>>
>>>      runtime: 300s
>>>      nr_task: 8t
>>>      disk: 1SSD
>>>      fs: btrfs
>>>      rw: randread
>>>      bs: 2M
>>>      ioengine: sync
>>>      test_size: 256g
>>>      cpufreq_governor: performance
>>>
>>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
>>> test-url: https://github.com/axboe/fio
>>>
>>>
>>> Details are as below:
>>>
>>> =========================================================================================
>>> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
>>>     2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic
>>>
>>> commit:
>>>     f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
>>>     8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
>>>
>>> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
>>> ---------------- ---------------------------
>>>            %stddev     %change         %stddev
>>>                \          |                \
>>>         0.03 ±106%      +0.2        0.22 ± 80%  fio.latency_20ms%
>>>         0.02 ± 33%      -0.0        0.01 ± 12%  fio.latency_4ms%
>>>         2508           -10.6%       2243        fio.read_bw_MBps
>>>      6717440           +17.6%    7897088        fio.read_clat_90%_us
>>>      6892202           +19.0%    8202922        fio.read_clat_95%_us
>>>      7602176 ±  4%     +18.4%    9000277 ±  3%  fio.read_clat_99%_us
>>>      6374238           +11.8%    7127450        fio.read_clat_mean_us
>>>       363825 ± 10%     +74.9%     636378 ±  5%  fio.read_clat_stddev
>>>         1254           -10.6%       1121        fio.read_iops
>>>       104.97           +11.8%     117.32        fio.time.elapsed_time
>>>       104.97           +11.8%     117.32        fio.time.elapsed_time.max
>>>        13731            +5.6%      14498 ±  4%  fio.time.maximum_resident_set_size
>>>       116.00            -8.2%     106.50        fio.time.percent_of_cpu_this_job_got
>>>    1.998e+10           +11.4%  2.226e+10        cpuidle..time
>>>         3.27 ±  3%      +4.6%       3.42        iostat.cpu.iowait
>>>         4.49 ± 68%      -2.1        2.38 ±152%  perf-profile.children.cycles-pp.number
>>>         4.49 ± 68%      -2.5        1.98 ±175%  perf-profile.self.cycles-pp.number
>>>       557763            +5.4%     587781        proc-vmstat.pgfault
>>>        25488            +3.1%      26274        proc-vmstat.pgreuse
>>>      2459048           -10.1%    2209482        vmstat.io.bi
>>>       184649 ±  5%     -10.4%     165526 ±  7%  vmstat.system.cs
>>>       111733 ± 30%     +61.8%     180770 ± 21%  numa-meminfo.node0.AnonPages
>>>       113221 ± 30%     +60.2%     181416 ± 21%  numa-meminfo.node0.Inactive(anon)
>>>        11301 ± 24%    +164.5%      29888 ±117%  numa-meminfo.node2.Active(file)
>>>       104911 ± 39%     -80.5%      20456 ±100%  numa-meminfo.node3.AnonHugePages
>>>       131666 ± 27%     -67.9%      42297 ± 82%  numa-meminfo.node3.AnonPages
>>>       132698 ± 26%     -67.5%      43158 ± 81%  numa-meminfo.node3.Inactive(anon)
>>>        27934 ± 30%     +61.8%      45196 ± 21%  numa-vmstat.node0.nr_anon_pages
>>>        28306 ± 30%     +60.2%      45358 ± 21%  numa-vmstat.node0.nr_inactive_anon
>>>        28305 ± 30%     +60.2%      45357 ± 21%  numa-vmstat.node0.nr_zone_inactive_anon
>>>         6291 ± 24%     +68.0%      10567 ± 26%  numa-vmstat.node2.workingset_nodes
>>>        32925 ± 27%     -67.9%      10571 ± 82%  numa-vmstat.node3.nr_anon_pages
>>>        33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_inactive_anon
>>>        33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_zone_inactive_anon
>>>       161.78 ±  4%     -28.2%     116.10 ± 30%  sched_debug.cfs_rq:/.runnable_avg.avg
>>>       161.46 ±  4%     -28.2%     115.85 ± 30%  sched_debug.cfs_rq:/.util_avg.avg
>>>       426382           +11.0%     473345 ±  6%  sched_debug.cpu.clock.avg
>>>       426394           +11.0%     473357 ±  6%  sched_debug.cpu.clock.max
>>>       426370           +11.0%     473331 ±  6%  sched_debug.cpu.clock.min
>>>       426139           +10.9%     472586 ±  6%  sched_debug.cpu.clock_task.avg
>>>       426368           +11.0%     473130 ±  6%  sched_debug.cpu.clock_task.max
>>>       416196           +11.1%     462228 ±  6%  sched_debug.cpu.clock_task.min
>>>         1156 ±  7%     -10.8%       1031 ±  6%  sched_debug.cpu.curr->pid.stddev
>>>       426372           +11.0%     473334 ±  6%  sched_debug.cpu_clk
>>>       425355           +11.0%     472318 ±  6%  sched_debug.ktime
>>>       426826           +11.0%     473787 ±  6%  sched_debug.sched_clk
>>>    1.263e+09            -7.9%  1.164e+09 ±  3%  perf-stat.i.branch-instructions
>>>       190886 ±  5%     -10.8%     170290 ±  7%  perf-stat.i.context-switches
>>>    1.979e+09            -8.8%  1.804e+09 ±  2%  perf-stat.i.dTLB-loads
>>>    8.998e+08            -8.2%  8.257e+08 ±  2%  perf-stat.i.dTLB-stores
>>>    6.455e+09            -8.0%  5.938e+09 ±  3%  perf-stat.i.instructions
>>>        21.78            -8.4%      19.95        perf-stat.i.metric.M/sec
>>>      7045315 ±  4%     -14.0%    6057863 ±  6%  perf-stat.i.node-load-misses
>>>      2658563 ±  7%     -21.9%    2077647 ± 12%  perf-stat.i.node-loads
>>>       414822 ±  4%     -12.9%     361455 ±  3%  perf-stat.i.node-store-misses
>>>    1.251e+09            -7.8%  1.154e+09 ±  3%  perf-stat.ps.branch-instructions
>>>       189082 ±  5%     -10.7%     168849 ±  7%  perf-stat.ps.context-switches
>>>     1.96e+09            -8.8%  1.789e+09 ±  2%  perf-stat.ps.dTLB-loads
>>>    8.912e+08            -8.1%  8.187e+08 ±  2%  perf-stat.ps.dTLB-stores
>>>    6.393e+09            -7.9%  5.888e+09 ±  3%  perf-stat.ps.instructions
>>>      6978485 ±  4%     -13.9%    6006510 ±  6%  perf-stat.ps.node-load-misses
>>>      2633627 ±  7%     -21.8%    2060033 ± 12%  perf-stat.ps.node-loads
>>>       410822 ±  4%     -12.8%     358289 ±  3%  perf-stat.ps.node-store-misses
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> | Reported-by: kernel test robot <yujie.liu@intel.com>
>>> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com
>>>
>>>
>>> To reproduce:
>>>
>>>           git clone https://github.com/intel/lkp-tests.git
>>>           cd lkp-tests
>>>           sudo bin/lkp install job.yaml           # job file is attached in this email
>>>           bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>>           sudo bin/lkp run generated-yaml-file
>>>
>>>           # if come across any failure that blocks the test,
>>>           # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>
>>>
>>> Disclaimer:
>>> Results have been estimated based on internal Intel analysis and are provided
>>> for informational purposes only. Any difference in system hardware or software
>>> design or configuration may affect actual performance.
>>>
>>>
>> _______________________________________________
>> LKP mailing list -- lkp@lists.01.org
>> To unsubscribe send an email to lkp-leave@lists.01.org
> 
> .
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression
  2022-10-08  8:00 ` [blk] 8c5035dfbb: fio.read_iops -10.6% regression Yu Kuai
  2022-10-09  5:47   ` [LKP] " Yin Fengwei
@ 2022-10-09  8:43   ` Ming Lei
  2022-10-09  9:32     ` Yu Kuai
  1 sibling, 1 reply; 6+ messages in thread
From: Ming Lei @ 2022-10-09  8:43 UTC (permalink / raw)
  To: Yu Kuai
  Cc: kernel test robot, lkp, lkp, Jens Axboe, linux-kernel,
	linux-block, ying.huang, feng.tang, zhengjun.xing, fengwei.yin,
	yukuai (C)

On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2022/10/08 10:50, kernel test robot 写道:
> > Greeting,
> > 
> > FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
> 
> I don't know how this is working but I'm *sure* this commit won't affect

Looks it is wrong to move 

	wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));

before rq_qos_add() in wbt_init().

Without adding wbt rq_qos, wbt_set_write_cache is just a nop.


thanks,
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression
  2022-10-09  8:43   ` Ming Lei
@ 2022-10-09  9:32     ` Yu Kuai
  2022-10-13  7:42       ` Feng Tang
  0 siblings, 1 reply; 6+ messages in thread
From: Yu Kuai @ 2022-10-09  9:32 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai
  Cc: kernel test robot, lkp, lkp, Jens Axboe, linux-kernel,
	linux-block, ying.huang, feng.tang, zhengjun.xing, fengwei.yin,
	yukuai (C)

Hi,

在 2022/10/09 16:43, Ming Lei 写道:
> On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2022/10/08 10:50, kernel test robot 写道:
>>> Greeting,
>>>
>>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
>>
>> I don't know how this is working but I'm *sure* this commit won't affect
> 
> Looks it is wrong to move
> 
> 	wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
> 
> before rq_qos_add() in wbt_init().
> 
> Without adding wbt rq_qos, wbt_set_write_cache is just a nop.

Yes, I got it now, I'm being foolish here.

I missed that "rwb->wc" is got by rq_qos in wbt_set_write_cache(), which
is NULL before rq_qos_add(). By the way, it's interesting that how read
performance is affected, I still don't know why yet...

Thanks,
Kuai
> 
> 
> thanks,
> Ming
> 
> .
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression
  2022-10-09  9:32     ` Yu Kuai
@ 2022-10-13  7:42       ` Feng Tang
  0 siblings, 0 replies; 6+ messages in thread
From: Feng Tang @ 2022-10-13  7:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Ming Lei, Liu, Yujie, lkp@lists.01.org, lkp, Jens Axboe,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Huang, Ying, zhengjun.xing@linux.intel.com, Yin, Fengwei,
	yukuai (C)

On Sun, Oct 09, 2022 at 05:32:34PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2022/10/09 16:43, Ming Lei 写道:
> > On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote:
> >> Hi,
> >>
> >> 在 2022/10/08 10:50, kernel test robot 写道:
> >>> Greeting,
> >>>
> >>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
> >>
> >> I don't know how this is working but I'm *sure* this commit won't affect
> > 
> > Looks it is wrong to move
> > 
> > 	wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
> > 
> > before rq_qos_add() in wbt_init().
> > 
> > Without adding wbt rq_qos, wbt_set_write_cache is just a nop.
> 
> Yes, I got it now, I'm being foolish here.
> 
> I missed that "rwb->wc" is got by rq_qos in wbt_set_write_cache(), which
> is NULL before rq_qos_add(). By the way, it's interesting that how read
> performance is affected, I still don't know why yet...

Indeed, we are confused too. So we did some further check, and found
it could be related with the less calls of wake_up_all(), due to the
'rwb->wc' value changed. 

I'm not familiar with the block layer and VFS, and just checked the
'blk-wbt.c'. Before commit 8c5035dfbb, the 'rwb->wc' is 0 in 0Day's
test env, while it's 1 after the commit.

Inside wbt_rqw_done(), 'rwb->wc' be used to judge whether to wakeup
other waiters in system, so we add some debug code to check the
wakeup and skip-wakeup counter:

  ----------------------------------------------------------------
  @@ -130,6 +130,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
   	/*
   	 * Don't wake anyone up if we are above the normal limit.
   	 */
  -	if (inflight && inflight >= limit)
  +	if (inflight && inflight >= limit) {
  +		skip_wakeup++;
   		return;
  +	}
   
   	if (wq_has_sleeper(&rqw->wait)) {
   		int diff = limit - inflight;
   
  -		if (!inflight || diff >= rwb->wb_background / 2)
  +		if (!inflight || diff >= rwb->wb_background / 2) {
  +			wakeup++;
   			wake_up_all(&rqw->wait);
  +		}
  ----------------------------------------------------------------

And after the fio task, the 'skip_wakeup' number is much bigger
after the patch:
  
  before patch:
      422.274394: wbt_rqw_done: wakeup_skip=19408 wakup_all=1944759
  
  after patch:
      433.753345: wbt_rqw_done: wakeup_skip=2090585 wakup_all=13630

Hope this can help the root causing.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-13  7:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <202210081045.77ddf59b-yujie.liu@intel.com>
2022-10-08  8:00 ` [blk] 8c5035dfbb: fio.read_iops -10.6% regression Yu Kuai
2022-10-09  5:47   ` [LKP] " Yin Fengwei
2022-10-09  6:14     ` Yu Kuai
2022-10-09  8:43   ` Ming Lei
2022-10-09  9:32     ` Yu Kuai
2022-10-13  7:42       ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox