* Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression [not found] <202210081045.77ddf59b-yujie.liu@intel.com> @ 2022-10-08 8:00 ` Yu Kuai 2022-10-09 5:47 ` [LKP] " Yin Fengwei 2022-10-09 8:43 ` Ming Lei 0 siblings, 2 replies; 6+ messages in thread From: Yu Kuai @ 2022-10-08 8:00 UTC (permalink / raw) To: kernel test robot Cc: lkp, lkp, Jens Axboe, linux-kernel, linux-block, ying.huang, feng.tang, zhengjun.xing, fengwei.yin, yukuai (C) Hi, 在 2022/10/08 10:50, kernel test robot 写道: > Greeting, > > FYI, we noticed a -10.6% regression of fio.read_iops due to commit: I don't know how this is working but I'm *sure* this commit won't affect performance. Please take a look at the commit, only wbt initialization is touched, which is done while creating the device: device_add_disk blk_register_queue wbt_enable_default wbt_init And io path is the same with or without this commit. By the way, wbt should only work for write. Thanks, Kuai > > commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > in testcase: fio-basic > on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory > with following parameters: > > runtime: 300s > nr_task: 8t > disk: 1SSD > fs: btrfs > rw: randread > bs: 2M > ioengine: sync > test_size: 256g > cpufreq_governor: performance > > test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. > test-url: https://github.com/axboe/fio > > > Details are as below: > > ========================================================================================= > bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase: > 2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic > > commit: > f7de4886fe ("rnbd-srv: remove struct rnbd_dev") > 8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized") > > f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 0.03 ±106% +0.2 0.22 ± 80% fio.latency_20ms% > 0.02 ± 33% -0.0 0.01 ± 12% fio.latency_4ms% > 2508 -10.6% 2243 fio.read_bw_MBps > 6717440 +17.6% 7897088 fio.read_clat_90%_us > 6892202 +19.0% 8202922 fio.read_clat_95%_us > 7602176 ± 4% +18.4% 9000277 ± 3% fio.read_clat_99%_us > 6374238 +11.8% 7127450 fio.read_clat_mean_us > 363825 ± 10% +74.9% 636378 ± 5% fio.read_clat_stddev > 1254 -10.6% 1121 fio.read_iops > 104.97 +11.8% 117.32 fio.time.elapsed_time > 104.97 +11.8% 117.32 fio.time.elapsed_time.max > 13731 +5.6% 14498 ± 4% fio.time.maximum_resident_set_size > 116.00 -8.2% 106.50 fio.time.percent_of_cpu_this_job_got > 1.998e+10 +11.4% 2.226e+10 cpuidle..time > 3.27 ± 3% +4.6% 3.42 iostat.cpu.iowait > 4.49 ± 68% -2.1 2.38 ±152% perf-profile.children.cycles-pp.number > 4.49 ± 68% -2.5 1.98 ±175% perf-profile.self.cycles-pp.number > 557763 +5.4% 587781 proc-vmstat.pgfault > 25488 +3.1% 26274 proc-vmstat.pgreuse > 2459048 -10.1% 2209482 vmstat.io.bi > 184649 ± 5% -10.4% 165526 ± 7% vmstat.system.cs > 111733 ± 30% +61.8% 180770 ± 21% numa-meminfo.node0.AnonPages > 113221 ± 30% +60.2% 181416 ± 21% numa-meminfo.node0.Inactive(anon) > 11301 ± 24% +164.5% 29888 ±117% numa-meminfo.node2.Active(file) > 104911 ± 39% -80.5% 20456 ±100% numa-meminfo.node3.AnonHugePages > 131666 ± 27% -67.9% 42297 ± 82% numa-meminfo.node3.AnonPages > 132698 ± 26% -67.5% 43158 ± 81% numa-meminfo.node3.Inactive(anon) > 27934 ± 30% +61.8% 45196 ± 21% numa-vmstat.node0.nr_anon_pages > 28306 ± 30% +60.2% 45358 ± 21% numa-vmstat.node0.nr_inactive_anon > 28305 ± 30% +60.2% 45357 ± 21% numa-vmstat.node0.nr_zone_inactive_anon > 6291 ± 24% +68.0% 10567 ± 26% numa-vmstat.node2.workingset_nodes > 32925 ± 27% -67.9% 10571 ± 82% numa-vmstat.node3.nr_anon_pages > 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_inactive_anon > 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_zone_inactive_anon > 161.78 ± 4% -28.2% 116.10 ± 30% sched_debug.cfs_rq:/.runnable_avg.avg > 161.46 ± 4% -28.2% 115.85 ± 30% sched_debug.cfs_rq:/.util_avg.avg > 426382 +11.0% 473345 ± 6% sched_debug.cpu.clock.avg > 426394 +11.0% 473357 ± 6% sched_debug.cpu.clock.max > 426370 +11.0% 473331 ± 6% sched_debug.cpu.clock.min > 426139 +10.9% 472586 ± 6% sched_debug.cpu.clock_task.avg > 426368 +11.0% 473130 ± 6% sched_debug.cpu.clock_task.max > 416196 +11.1% 462228 ± 6% sched_debug.cpu.clock_task.min > 1156 ± 7% -10.8% 1031 ± 6% sched_debug.cpu.curr->pid.stddev > 426372 +11.0% 473334 ± 6% sched_debug.cpu_clk > 425355 +11.0% 472318 ± 6% sched_debug.ktime > 426826 +11.0% 473787 ± 6% sched_debug.sched_clk > 1.263e+09 -7.9% 1.164e+09 ± 3% perf-stat.i.branch-instructions > 190886 ± 5% -10.8% 170290 ± 7% perf-stat.i.context-switches > 1.979e+09 -8.8% 1.804e+09 ± 2% perf-stat.i.dTLB-loads > 8.998e+08 -8.2% 8.257e+08 ± 2% perf-stat.i.dTLB-stores > 6.455e+09 -8.0% 5.938e+09 ± 3% perf-stat.i.instructions > 21.78 -8.4% 19.95 perf-stat.i.metric.M/sec > 7045315 ± 4% -14.0% 6057863 ± 6% perf-stat.i.node-load-misses > 2658563 ± 7% -21.9% 2077647 ± 12% perf-stat.i.node-loads > 414822 ± 4% -12.9% 361455 ± 3% perf-stat.i.node-store-misses > 1.251e+09 -7.8% 1.154e+09 ± 3% perf-stat.ps.branch-instructions > 189082 ± 5% -10.7% 168849 ± 7% perf-stat.ps.context-switches > 1.96e+09 -8.8% 1.789e+09 ± 2% perf-stat.ps.dTLB-loads > 8.912e+08 -8.1% 8.187e+08 ± 2% perf-stat.ps.dTLB-stores > 6.393e+09 -7.9% 5.888e+09 ± 3% perf-stat.ps.instructions > 6978485 ± 4% -13.9% 6006510 ± 6% perf-stat.ps.node-load-misses > 2633627 ± 7% -21.8% 2060033 ± 12% perf-stat.ps.node-loads > 410822 ± 4% -12.8% 358289 ± 3% perf-stat.ps.node-store-misses > > > If you fix the issue, kindly add following tag > | Reported-by: kernel test robot <yujie.liu@intel.com> > | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > sudo bin/lkp install job.yaml # job file is attached in this email > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run > sudo bin/lkp run generated-yaml-file > > # if come across any failure that blocks the test, > # please remove ~/.lkp and /lkp dir to run from a clean state. > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LKP] Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression 2022-10-08 8:00 ` [blk] 8c5035dfbb: fio.read_iops -10.6% regression Yu Kuai @ 2022-10-09 5:47 ` Yin Fengwei 2022-10-09 6:14 ` Yu Kuai 2022-10-09 8:43 ` Ming Lei 1 sibling, 1 reply; 6+ messages in thread From: Yin Fengwei @ 2022-10-09 5:47 UTC (permalink / raw) To: Yu Kuai, kernel test robot Cc: lkp, lkp, Jens Axboe, linux-kernel, linux-block, yukuai (C) Hi Kuai, On 10/8/22 16:00, Yu Kuai wrote: > Hi, > > 在 2022/10/08 10:50, kernel test robot 写道: >> Greeting, >> >> FYI, we noticed a -10.6% regression of fio.read_iops due to commit: > > I don't know how this is working but I'm *sure* this commit won't affect > performance. Please take a look at the commit, only wbt initialization > is touched, which is done while creating the device: > > device_add_disk > blk_register_queue > wbt_enable_default > wbt_init > > And io path is the same with or without this commit. > > By the way, wbt should only work for write. Some information here: It looks like the line wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); matters. If move only this line to original position based on 8c5035dfbb, the regression is gone. If move only this line before ret = rq_qos_add() (just like your patch did, but only with this line) based on 8c5035dfbb, the regression can be reproduced. Regards Yin, Fengwei > > Thanks, > Kuai >> >> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized") >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >> >> in testcase: fio-basic >> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory >> with following parameters: >> >> runtime: 300s >> nr_task: 8t >> disk: 1SSD >> fs: btrfs >> rw: randread >> bs: 2M >> ioengine: sync >> test_size: 256g >> cpufreq_governor: performance >> >> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. >> test-url: https://github.com/axboe/fio >> >> >> Details are as below: >> >> ========================================================================================= >> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase: >> 2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic >> >> commit: >> f7de4886fe ("rnbd-srv: remove struct rnbd_dev") >> 8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized") >> >> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73 >> ---------------- --------------------------- >> %stddev %change %stddev >> \ | \ >> 0.03 ±106% +0.2 0.22 ± 80% fio.latency_20ms% >> 0.02 ± 33% -0.0 0.01 ± 12% fio.latency_4ms% >> 2508 -10.6% 2243 fio.read_bw_MBps >> 6717440 +17.6% 7897088 fio.read_clat_90%_us >> 6892202 +19.0% 8202922 fio.read_clat_95%_us >> 7602176 ± 4% +18.4% 9000277 ± 3% fio.read_clat_99%_us >> 6374238 +11.8% 7127450 fio.read_clat_mean_us >> 363825 ± 10% +74.9% 636378 ± 5% fio.read_clat_stddev >> 1254 -10.6% 1121 fio.read_iops >> 104.97 +11.8% 117.32 fio.time.elapsed_time >> 104.97 +11.8% 117.32 fio.time.elapsed_time.max >> 13731 +5.6% 14498 ± 4% fio.time.maximum_resident_set_size >> 116.00 -8.2% 106.50 fio.time.percent_of_cpu_this_job_got >> 1.998e+10 +11.4% 2.226e+10 cpuidle..time >> 3.27 ± 3% +4.6% 3.42 iostat.cpu.iowait >> 4.49 ± 68% -2.1 2.38 ±152% perf-profile.children.cycles-pp.number >> 4.49 ± 68% -2.5 1.98 ±175% perf-profile.self.cycles-pp.number >> 557763 +5.4% 587781 proc-vmstat.pgfault >> 25488 +3.1% 26274 proc-vmstat.pgreuse >> 2459048 -10.1% 2209482 vmstat.io.bi >> 184649 ± 5% -10.4% 165526 ± 7% vmstat.system.cs >> 111733 ± 30% +61.8% 180770 ± 21% numa-meminfo.node0.AnonPages >> 113221 ± 30% +60.2% 181416 ± 21% numa-meminfo.node0.Inactive(anon) >> 11301 ± 24% +164.5% 29888 ±117% numa-meminfo.node2.Active(file) >> 104911 ± 39% -80.5% 20456 ±100% numa-meminfo.node3.AnonHugePages >> 131666 ± 27% -67.9% 42297 ± 82% numa-meminfo.node3.AnonPages >> 132698 ± 26% -67.5% 43158 ± 81% numa-meminfo.node3.Inactive(anon) >> 27934 ± 30% +61.8% 45196 ± 21% numa-vmstat.node0.nr_anon_pages >> 28306 ± 30% +60.2% 45358 ± 21% numa-vmstat.node0.nr_inactive_anon >> 28305 ± 30% +60.2% 45357 ± 21% numa-vmstat.node0.nr_zone_inactive_anon >> 6291 ± 24% +68.0% 10567 ± 26% numa-vmstat.node2.workingset_nodes >> 32925 ± 27% -67.9% 10571 ± 82% numa-vmstat.node3.nr_anon_pages >> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_inactive_anon >> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_zone_inactive_anon >> 161.78 ± 4% -28.2% 116.10 ± 30% sched_debug.cfs_rq:/.runnable_avg.avg >> 161.46 ± 4% -28.2% 115.85 ± 30% sched_debug.cfs_rq:/.util_avg.avg >> 426382 +11.0% 473345 ± 6% sched_debug.cpu.clock.avg >> 426394 +11.0% 473357 ± 6% sched_debug.cpu.clock.max >> 426370 +11.0% 473331 ± 6% sched_debug.cpu.clock.min >> 426139 +10.9% 472586 ± 6% sched_debug.cpu.clock_task.avg >> 426368 +11.0% 473130 ± 6% sched_debug.cpu.clock_task.max >> 416196 +11.1% 462228 ± 6% sched_debug.cpu.clock_task.min >> 1156 ± 7% -10.8% 1031 ± 6% sched_debug.cpu.curr->pid.stddev >> 426372 +11.0% 473334 ± 6% sched_debug.cpu_clk >> 425355 +11.0% 472318 ± 6% sched_debug.ktime >> 426826 +11.0% 473787 ± 6% sched_debug.sched_clk >> 1.263e+09 -7.9% 1.164e+09 ± 3% perf-stat.i.branch-instructions >> 190886 ± 5% -10.8% 170290 ± 7% perf-stat.i.context-switches >> 1.979e+09 -8.8% 1.804e+09 ± 2% perf-stat.i.dTLB-loads >> 8.998e+08 -8.2% 8.257e+08 ± 2% perf-stat.i.dTLB-stores >> 6.455e+09 -8.0% 5.938e+09 ± 3% perf-stat.i.instructions >> 21.78 -8.4% 19.95 perf-stat.i.metric.M/sec >> 7045315 ± 4% -14.0% 6057863 ± 6% perf-stat.i.node-load-misses >> 2658563 ± 7% -21.9% 2077647 ± 12% perf-stat.i.node-loads >> 414822 ± 4% -12.9% 361455 ± 3% perf-stat.i.node-store-misses >> 1.251e+09 -7.8% 1.154e+09 ± 3% perf-stat.ps.branch-instructions >> 189082 ± 5% -10.7% 168849 ± 7% perf-stat.ps.context-switches >> 1.96e+09 -8.8% 1.789e+09 ± 2% perf-stat.ps.dTLB-loads >> 8.912e+08 -8.1% 8.187e+08 ± 2% perf-stat.ps.dTLB-stores >> 6.393e+09 -7.9% 5.888e+09 ± 3% perf-stat.ps.instructions >> 6978485 ± 4% -13.9% 6006510 ± 6% perf-stat.ps.node-load-misses >> 2633627 ± 7% -21.8% 2060033 ± 12% perf-stat.ps.node-loads >> 410822 ± 4% -12.8% 358289 ± 3% perf-stat.ps.node-store-misses >> >> >> If you fix the issue, kindly add following tag >> | Reported-by: kernel test robot <yujie.liu@intel.com> >> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com >> >> >> To reproduce: >> >> git clone https://github.com/intel/lkp-tests.git >> cd lkp-tests >> sudo bin/lkp install job.yaml # job file is attached in this email >> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run >> sudo bin/lkp run generated-yaml-file >> >> # if come across any failure that blocks the test, >> # please remove ~/.lkp and /lkp dir to run from a clean state. >> >> >> Disclaimer: >> Results have been estimated based on internal Intel analysis and are provided >> for informational purposes only. Any difference in system hardware or software >> design or configuration may affect actual performance. >> >> > _______________________________________________ > LKP mailing list -- lkp@lists.01.org > To unsubscribe send an email to lkp-leave@lists.01.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LKP] Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression 2022-10-09 5:47 ` [LKP] " Yin Fengwei @ 2022-10-09 6:14 ` Yu Kuai 0 siblings, 0 replies; 6+ messages in thread From: Yu Kuai @ 2022-10-09 6:14 UTC (permalink / raw) To: Yin Fengwei, Yu Kuai, kernel test robot Cc: lkp, lkp, Jens Axboe, linux-kernel, linux-block, yukuai (C) Hi, 在 2022/10/09 13:47, Yin Fengwei 写道: > Hi Kuai, > > On 10/8/22 16:00, Yu Kuai wrote: >> Hi, >> >> 在 2022/10/08 10:50, kernel test robot 写道: >>> Greeting, >>> >>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit: >> >> I don't know how this is working but I'm *sure* this commit won't affect >> performance. Please take a look at the commit, only wbt initialization >> is touched, which is done while creating the device: >> >> device_add_disk >> blk_register_queue >> wbt_enable_default >> wbt_init >> >> And io path is the same with or without this commit. >> >> By the way, wbt should only work for write. > Some information here: > It looks like the line > wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); > matters. > > If move only this line to original position based on 8c5035dfbb, > the regression is gone. > > If move only this line before ret = rq_qos_add() (just like your patch > did, but only with this line) based on 8c5035dfbb, the regression can > be reproduced. > Thanks for the information, but I still don't understand if there is any difference after wbt_init() is done, and how does read is afftected by wbt. 🙁 > > Regards > Yin, Fengwei > >> >> Thanks, >> Kuai >>> >>> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized") >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >>> >>> in testcase: fio-basic >>> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory >>> with following parameters: >>> >>> runtime: 300s >>> nr_task: 8t >>> disk: 1SSD >>> fs: btrfs >>> rw: randread >>> bs: 2M >>> ioengine: sync >>> test_size: 256g >>> cpufreq_governor: performance >>> >>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. >>> test-url: https://github.com/axboe/fio >>> >>> >>> Details are as below: >>> >>> ========================================================================================= >>> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase: >>> 2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic >>> >>> commit: >>> f7de4886fe ("rnbd-srv: remove struct rnbd_dev") >>> 8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized") >>> >>> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73 >>> ---------------- --------------------------- >>> %stddev %change %stddev >>> \ | \ >>> 0.03 ±106% +0.2 0.22 ± 80% fio.latency_20ms% >>> 0.02 ± 33% -0.0 0.01 ± 12% fio.latency_4ms% >>> 2508 -10.6% 2243 fio.read_bw_MBps >>> 6717440 +17.6% 7897088 fio.read_clat_90%_us >>> 6892202 +19.0% 8202922 fio.read_clat_95%_us >>> 7602176 ± 4% +18.4% 9000277 ± 3% fio.read_clat_99%_us >>> 6374238 +11.8% 7127450 fio.read_clat_mean_us >>> 363825 ± 10% +74.9% 636378 ± 5% fio.read_clat_stddev >>> 1254 -10.6% 1121 fio.read_iops >>> 104.97 +11.8% 117.32 fio.time.elapsed_time >>> 104.97 +11.8% 117.32 fio.time.elapsed_time.max >>> 13731 +5.6% 14498 ± 4% fio.time.maximum_resident_set_size >>> 116.00 -8.2% 106.50 fio.time.percent_of_cpu_this_job_got >>> 1.998e+10 +11.4% 2.226e+10 cpuidle..time >>> 3.27 ± 3% +4.6% 3.42 iostat.cpu.iowait >>> 4.49 ± 68% -2.1 2.38 ±152% perf-profile.children.cycles-pp.number >>> 4.49 ± 68% -2.5 1.98 ±175% perf-profile.self.cycles-pp.number >>> 557763 +5.4% 587781 proc-vmstat.pgfault >>> 25488 +3.1% 26274 proc-vmstat.pgreuse >>> 2459048 -10.1% 2209482 vmstat.io.bi >>> 184649 ± 5% -10.4% 165526 ± 7% vmstat.system.cs >>> 111733 ± 30% +61.8% 180770 ± 21% numa-meminfo.node0.AnonPages >>> 113221 ± 30% +60.2% 181416 ± 21% numa-meminfo.node0.Inactive(anon) >>> 11301 ± 24% +164.5% 29888 ±117% numa-meminfo.node2.Active(file) >>> 104911 ± 39% -80.5% 20456 ±100% numa-meminfo.node3.AnonHugePages >>> 131666 ± 27% -67.9% 42297 ± 82% numa-meminfo.node3.AnonPages >>> 132698 ± 26% -67.5% 43158 ± 81% numa-meminfo.node3.Inactive(anon) >>> 27934 ± 30% +61.8% 45196 ± 21% numa-vmstat.node0.nr_anon_pages >>> 28306 ± 30% +60.2% 45358 ± 21% numa-vmstat.node0.nr_inactive_anon >>> 28305 ± 30% +60.2% 45357 ± 21% numa-vmstat.node0.nr_zone_inactive_anon >>> 6291 ± 24% +68.0% 10567 ± 26% numa-vmstat.node2.workingset_nodes >>> 32925 ± 27% -67.9% 10571 ± 82% numa-vmstat.node3.nr_anon_pages >>> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_inactive_anon >>> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_zone_inactive_anon >>> 161.78 ± 4% -28.2% 116.10 ± 30% sched_debug.cfs_rq:/.runnable_avg.avg >>> 161.46 ± 4% -28.2% 115.85 ± 30% sched_debug.cfs_rq:/.util_avg.avg >>> 426382 +11.0% 473345 ± 6% sched_debug.cpu.clock.avg >>> 426394 +11.0% 473357 ± 6% sched_debug.cpu.clock.max >>> 426370 +11.0% 473331 ± 6% sched_debug.cpu.clock.min >>> 426139 +10.9% 472586 ± 6% sched_debug.cpu.clock_task.avg >>> 426368 +11.0% 473130 ± 6% sched_debug.cpu.clock_task.max >>> 416196 +11.1% 462228 ± 6% sched_debug.cpu.clock_task.min >>> 1156 ± 7% -10.8% 1031 ± 6% sched_debug.cpu.curr->pid.stddev >>> 426372 +11.0% 473334 ± 6% sched_debug.cpu_clk >>> 425355 +11.0% 472318 ± 6% sched_debug.ktime >>> 426826 +11.0% 473787 ± 6% sched_debug.sched_clk >>> 1.263e+09 -7.9% 1.164e+09 ± 3% perf-stat.i.branch-instructions >>> 190886 ± 5% -10.8% 170290 ± 7% perf-stat.i.context-switches >>> 1.979e+09 -8.8% 1.804e+09 ± 2% perf-stat.i.dTLB-loads >>> 8.998e+08 -8.2% 8.257e+08 ± 2% perf-stat.i.dTLB-stores >>> 6.455e+09 -8.0% 5.938e+09 ± 3% perf-stat.i.instructions >>> 21.78 -8.4% 19.95 perf-stat.i.metric.M/sec >>> 7045315 ± 4% -14.0% 6057863 ± 6% perf-stat.i.node-load-misses >>> 2658563 ± 7% -21.9% 2077647 ± 12% perf-stat.i.node-loads >>> 414822 ± 4% -12.9% 361455 ± 3% perf-stat.i.node-store-misses >>> 1.251e+09 -7.8% 1.154e+09 ± 3% perf-stat.ps.branch-instructions >>> 189082 ± 5% -10.7% 168849 ± 7% perf-stat.ps.context-switches >>> 1.96e+09 -8.8% 1.789e+09 ± 2% perf-stat.ps.dTLB-loads >>> 8.912e+08 -8.1% 8.187e+08 ± 2% perf-stat.ps.dTLB-stores >>> 6.393e+09 -7.9% 5.888e+09 ± 3% perf-stat.ps.instructions >>> 6978485 ± 4% -13.9% 6006510 ± 6% perf-stat.ps.node-load-misses >>> 2633627 ± 7% -21.8% 2060033 ± 12% perf-stat.ps.node-loads >>> 410822 ± 4% -12.8% 358289 ± 3% perf-stat.ps.node-store-misses >>> >>> >>> If you fix the issue, kindly add following tag >>> | Reported-by: kernel test robot <yujie.liu@intel.com> >>> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com >>> >>> >>> To reproduce: >>> >>> git clone https://github.com/intel/lkp-tests.git >>> cd lkp-tests >>> sudo bin/lkp install job.yaml # job file is attached in this email >>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run >>> sudo bin/lkp run generated-yaml-file >>> >>> # if come across any failure that blocks the test, >>> # please remove ~/.lkp and /lkp dir to run from a clean state. >>> >>> >>> Disclaimer: >>> Results have been estimated based on internal Intel analysis and are provided >>> for informational purposes only. Any difference in system hardware or software >>> design or configuration may affect actual performance. >>> >>> >> _______________________________________________ >> LKP mailing list -- lkp@lists.01.org >> To unsubscribe send an email to lkp-leave@lists.01.org > > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression 2022-10-08 8:00 ` [blk] 8c5035dfbb: fio.read_iops -10.6% regression Yu Kuai 2022-10-09 5:47 ` [LKP] " Yin Fengwei @ 2022-10-09 8:43 ` Ming Lei 2022-10-09 9:32 ` Yu Kuai 1 sibling, 1 reply; 6+ messages in thread From: Ming Lei @ 2022-10-09 8:43 UTC (permalink / raw) To: Yu Kuai Cc: kernel test robot, lkp, lkp, Jens Axboe, linux-kernel, linux-block, ying.huang, feng.tang, zhengjun.xing, fengwei.yin, yukuai (C) On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote: > Hi, > > 在 2022/10/08 10:50, kernel test robot 写道: > > Greeting, > > > > FYI, we noticed a -10.6% regression of fio.read_iops due to commit: > > I don't know how this is working but I'm *sure* this commit won't affect Looks it is wrong to move wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); before rq_qos_add() in wbt_init(). Without adding wbt rq_qos, wbt_set_write_cache is just a nop. thanks, Ming ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression 2022-10-09 8:43 ` Ming Lei @ 2022-10-09 9:32 ` Yu Kuai 2022-10-13 7:42 ` Feng Tang 0 siblings, 1 reply; 6+ messages in thread From: Yu Kuai @ 2022-10-09 9:32 UTC (permalink / raw) To: Ming Lei, Yu Kuai Cc: kernel test robot, lkp, lkp, Jens Axboe, linux-kernel, linux-block, ying.huang, feng.tang, zhengjun.xing, fengwei.yin, yukuai (C) Hi, 在 2022/10/09 16:43, Ming Lei 写道: > On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2022/10/08 10:50, kernel test robot 写道: >>> Greeting, >>> >>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit: >> >> I don't know how this is working but I'm *sure* this commit won't affect > > Looks it is wrong to move > > wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); > > before rq_qos_add() in wbt_init(). > > Without adding wbt rq_qos, wbt_set_write_cache is just a nop. Yes, I got it now, I'm being foolish here. I missed that "rwb->wc" is got by rq_qos in wbt_set_write_cache(), which is NULL before rq_qos_add(). By the way, it's interesting that how read performance is affected, I still don't know why yet... Thanks, Kuai > > > thanks, > Ming > > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression 2022-10-09 9:32 ` Yu Kuai @ 2022-10-13 7:42 ` Feng Tang 0 siblings, 0 replies; 6+ messages in thread From: Feng Tang @ 2022-10-13 7:42 UTC (permalink / raw) To: Yu Kuai Cc: Ming Lei, Liu, Yujie, lkp@lists.01.org, lkp, Jens Axboe, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Huang, Ying, zhengjun.xing@linux.intel.com, Yin, Fengwei, yukuai (C) On Sun, Oct 09, 2022 at 05:32:34PM +0800, Yu Kuai wrote: > Hi, > > 在 2022/10/09 16:43, Ming Lei 写道: > > On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote: > >> Hi, > >> > >> 在 2022/10/08 10:50, kernel test robot 写道: > >>> Greeting, > >>> > >>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit: > >> > >> I don't know how this is working but I'm *sure* this commit won't affect > > > > Looks it is wrong to move > > > > wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); > > > > before rq_qos_add() in wbt_init(). > > > > Without adding wbt rq_qos, wbt_set_write_cache is just a nop. > > Yes, I got it now, I'm being foolish here. > > I missed that "rwb->wc" is got by rq_qos in wbt_set_write_cache(), which > is NULL before rq_qos_add(). By the way, it's interesting that how read > performance is affected, I still don't know why yet... Indeed, we are confused too. So we did some further check, and found it could be related with the less calls of wake_up_all(), due to the 'rwb->wc' value changed. I'm not familiar with the block layer and VFS, and just checked the 'blk-wbt.c'. Before commit 8c5035dfbb, the 'rwb->wc' is 0 in 0Day's test env, while it's 1 after the commit. Inside wbt_rqw_done(), 'rwb->wc' be used to judge whether to wakeup other waiters in system, so we add some debug code to check the wakeup and skip-wakeup counter: ---------------------------------------------------------------- @@ -130,6 +130,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw, /* * Don't wake anyone up if we are above the normal limit. */ - if (inflight && inflight >= limit) + if (inflight && inflight >= limit) { + skip_wakeup++; return; + } if (wq_has_sleeper(&rqw->wait)) { int diff = limit - inflight; - if (!inflight || diff >= rwb->wb_background / 2) + if (!inflight || diff >= rwb->wb_background / 2) { + wakeup++; wake_up_all(&rqw->wait); + } ---------------------------------------------------------------- And after the fio task, the 'skip_wakeup' number is much bigger after the patch: before patch: 422.274394: wbt_rqw_done: wakeup_skip=19408 wakup_all=1944759 after patch: 433.753345: wbt_rqw_done: wakeup_skip=2090585 wakup_all=13630 Hope this can help the root causing. Thanks, Feng ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-10-13 7:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <202210081045.77ddf59b-yujie.liu@intel.com>
2022-10-08 8:00 ` [blk] 8c5035dfbb: fio.read_iops -10.6% regression Yu Kuai
2022-10-09 5:47 ` [LKP] " Yin Fengwei
2022-10-09 6:14 ` Yu Kuai
2022-10-09 8:43 ` Ming Lei
2022-10-09 9:32 ` Yu Kuai
2022-10-13 7:42 ` Feng Tang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox