* Observing higher CPU utilization during random IO fio testing @ 2026-05-21 19:44 Wen Xiong 2026-05-21 21:52 ` Jens Axboe 2026-05-30 1:10 ` Ming Lei 0 siblings, 2 replies; 8+ messages in thread From: Wen Xiong @ 2026-05-21 19:44 UTC (permalink / raw) To: linux-block, axboe; +Cc: tom.leiming, jmoyer, Gjoyce, wenxiong Hi All, Our performance team observed the higher CPU utilization in RHEL10 compared to RHEL9.8, observed the similar issue in upstream kernel(v7.1-rc4) as well when running FIO random IO tests. System configuration: 47 dedicate cores 120 GB memory PCIe4 2-Port 64Gb FC Adapter FlashSystem: FS9500, 12 LUNs/FC port, 100G each LUN. Random IO tests are more CPU intensive than sequential IO tests due to several factors: more context switching, Interrupt Handling, cache Inefficiency etc. We found out the following patch which caused the higher CPU utilization in rhel10 and newer linux kernel: commit 060406c61c7cb4bbd82a02d179decca9c9bb3443 (HEAD) Author: Yu Kuai <yukuai3@huawei.com> Date: Thu May 9 20:38:25 2024 +0800 block: add plug while submitting IO So that if caller didn't use plug, for example, __blkdev_direct_IO_simple() and __blkdev_direct_IO_async(), block layer can still benefit from caching nsec time in the plug. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240509123825.3225207-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> We reverted above patch in rhel10 kernel and upstream 7.1-rc4, saw lower CPU utilization when doing the same FIO test. The patch adds plugging in __submit_bio() in block layer, maybe cause performance degradation: - Random IO tests have less merging, flush overhead. - More IO scheduler interaction, forces requests through scheduler instead of direct dispatch(direct dispatch to hardware queue) - Poor cache locality during plug operation Below are some performance data that our performance team collected: RHEL9.8 comparison RHEL10.0 Iotype qd nj rmix mpstat busy delta lparstat delta Randrw 1 20 100 135% 109% Randrw 1 40 100 72% 81% Randrw 1 20 70 278% 174% Randrw 1 40 70 272% 191% Randrw 1 20 0 93% 30% Randrw 1 40 0 104% 36% RHEL 9.8 comparison RHEL10 with reverting above plugging patch in block layer.h Iotype qd nj rmix mpstat busy delta lparstat deltab Randrw 1 20 100 -12% 20% Randrw 1 40 100 -42% -4% Randrw 1 20 70 70% 71% Randrw 1 40 70 %51 60% Randrw 1 20 0 -14% -43% Randrw 1 40 0 -33% -51% Can a block layer expert help us resolve this high CPU utilization performance issue? Let us know if you need more performance data or other perf data. Thanks a lot for your help! Wendy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Observing higher CPU utilization during random IO fio testing 2026-05-21 19:44 Observing higher CPU utilization during random IO fio testing Wen Xiong @ 2026-05-21 21:52 ` Jens Axboe 2026-05-25 5:28 ` Yu Kuai 2026-05-30 1:10 ` Ming Lei 1 sibling, 1 reply; 8+ messages in thread From: Jens Axboe @ 2026-05-21 21:52 UTC (permalink / raw) To: Wen Xiong, linux-block; +Cc: tom.leiming, jmoyer, Gjoyce, wenxiong, Yu Kuai On 5/21/26 1:44 PM, Wen Xiong wrote: > Hi All, > > Our performance team observed the higher CPU utilization in RHEL10 compared to RHEL9.8, observed the similar issue in upstream kernel(v7.1-rc4) as well when running FIO random IO tests. > > System configuration: > 47 dedicate cores > 120 GB memory > PCIe4 2-Port 64Gb FC Adapter > FlashSystem: FS9500, 12 LUNs/FC port, 100G each LUN. > > Random IO tests are more CPU intensive than sequential IO tests due to several factors: more context switching, Interrupt Handling, cache Inefficiency etc. We found out the following patch which caused the higher CPU utilization in rhel10 and newer linux kernel: > > commit 060406c61c7cb4bbd82a02d179decca9c9bb3443 (HEAD) > Author: Yu Kuai <yukuai3@huawei.com> > Date: Thu May 9 20:38:25 2024 +0800 > > block: add plug while submitting IO > > So that if caller didn't use plug, for example, __blkdev_direct_IO_simple() > and __blkdev_direct_IO_async(), block layer can still benefit from caching > nsec time in the plug. > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > Link: https://lore.kernel.org/r/20240509123825.3225207-1-yukuai1@huaweicloud.com > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > We reverted above patch in rhel10 kernel and upstream 7.1-rc4, saw lower CPU utilization when doing the same FIO test. > > The patch adds plugging in __submit_bio() in block layer, maybe cause performance degradation: > - Random IO tests have less merging, flush overhead. > - More IO scheduler interaction, forces requests through scheduler instead of direct dispatch(direct dispatch to hardware queue) > - Poor cache locality during plug operation > > Below are some performance data that our performance team collected: > > RHEL9.8 comparison RHEL10.0 > Iotype qd nj rmix mpstat busy delta lparstat delta > Randrw 1 20 100 135% 109% > Randrw 1 40 100 72% 81% > Randrw 1 20 70 278% 174% > Randrw 1 40 70 272% 191% > Randrw 1 20 0 93% 30% > Randrw 1 40 0 104% 36% > > RHEL 9.8 comparison RHEL10 with reverting above plugging patch in block layer.h > Iotype qd nj rmix mpstat busy delta lparstat deltab > Randrw 1 20 100 -12% 20% > Randrw 1 40 100 -42% -4% > Randrw 1 20 70 70% 71% > Randrw 1 40 70 %51 60% > Randrw 1 20 0 -14% -43% > Randrw 1 40 0 -33% -51% > > Can a block layer expert help us resolve this high CPU utilization performance issue? > Let us know if you need more performance data or other perf data. Let's CC Yu Kuai who wrote that commit, that might help. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Observing higher CPU utilization during random IO fio testing 2026-05-21 21:52 ` Jens Axboe @ 2026-05-25 5:28 ` Yu Kuai 2026-05-26 15:28 ` Wen Xiong 2026-05-29 17:13 ` Wen Xiong 0 siblings, 2 replies; 8+ messages in thread From: Yu Kuai @ 2026-05-25 5:28 UTC (permalink / raw) To: Jens Axboe, Wen Xiong, linux-block Cc: tom.leiming, jmoyer, Gjoyce, wenxiong, yukuai Hi, 在 2026/5/22 5:52, Jens Axboe 写道: > On 5/21/26 1:44 PM, Wen Xiong wrote: >> Hi All, >> >> Our performance team observed the higher CPU utilization in RHEL10 compared to RHEL9.8, observed the similar issue in upstream kernel(v7.1-rc4) as well when running FIO random IO tests. >> >> System configuration: >> 47 dedicate cores >> 120 GB memory >> PCIe4 2-Port 64Gb FC Adapter >> FlashSystem: FS9500, 12 LUNs/FC port, 100G each LUN. >> >> Random IO tests are more CPU intensive than sequential IO tests due to several factors: more context switching, Interrupt Handling, cache Inefficiency etc. We found out the following patch which caused the higher CPU utilization in rhel10 and newer linux kernel: >> >> commit 060406c61c7cb4bbd82a02d179decca9c9bb3443 (HEAD) >> Author: Yu Kuai <yukuai3@huawei.com> >> Date: Thu May 9 20:38:25 2024 +0800 >> >> block: add plug while submitting IO >> >> So that if caller didn't use plug, for example, __blkdev_direct_IO_simple() >> and __blkdev_direct_IO_async(), block layer can still benefit from caching >> nsec time in the plug. >> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> Link: https://lore.kernel.org/r/20240509123825.3225207-1-yukuai1@huaweicloud.com >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> We reverted above patch in rhel10 kernel and upstream 7.1-rc4, saw lower CPU utilization when doing the same FIO test. >> >> The patch adds plugging in __submit_bio() in block layer, maybe cause performance degradation: >> - Random IO tests have less merging, flush overhead. >> - More IO scheduler interaction, forces requests through scheduler instead of direct dispatch(direct dispatch to hardware queue) I don't understand this point. Can you explain more? I think plug should not matter if request go through scheduler or not. >> - Poor cache locality during plug operation >> >> Below are some performance data that our performance team collected: >> >> RHEL9.8 comparison RHEL10.0 >> Iotype qd nj rmix mpstat busy delta lparstat delta >> Randrw 1 20 100 135% 109% >> Randrw 1 40 100 72% 81% >> Randrw 1 20 70 278% 174% >> Randrw 1 40 70 272% 191% >> Randrw 1 20 0 93% 30% >> Randrw 1 40 0 104% 36% >> >> RHEL 9.8 comparison RHEL10 with reverting above plugging patch in block layer.h >> Iotype qd nj rmix mpstat busy delta lparstat deltab >> Randrw 1 20 100 -12% 20% >> Randrw 1 40 100 -42% -4% >> Randrw 1 20 70 70% 71% >> Randrw 1 40 70 %51 60% >> Randrw 1 20 0 -14% -43% >> Randrw 1 40 0 -33% -51% >> >> Can a block layer expert help us resolve this high CPU utilization performance issue? And I assume you're testing raw disk, because filesystems should always enable plug. >> Let us know if you need more performance data or other perf data. Yes, perf data will be helpful. And please show your test in details and I'll check if I can reproduce it. > Let's CC Yu Kuai who wrote that commit, that might help. > -- Thansk, Kuai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Observing higher CPU utilization during random IO fio testing 2026-05-25 5:28 ` Yu Kuai @ 2026-05-26 15:28 ` Wen Xiong 2026-05-29 17:13 ` Wen Xiong 1 sibling, 0 replies; 8+ messages in thread From: Wen Xiong @ 2026-05-26 15:28 UTC (permalink / raw) To: yukuai; +Cc: Jens Axboe, linux-block, tom.leiming, jmoyer, Gjoyce, wenxiong On 2026-05-25 00:28, Yu Kuai wrote: > Hi, > > 在 2026/5/22 5:52, Jens Axboe 写道: >>> - More IO scheduler interaction, forces requests through scheduler >>> instead of direct dispatch(direct dispatch to hardware queue) > I don't understand this point. Can you explain more? I think plug > should not matter if request go through scheduler or not. My understanding is: Random IO tests are more CPU intensive. Plug delays the dispatch IOs to hardware queue(quick way) directly. Plug submits multiple IO requests in a batch to defer submitting IO until calling blk_flush_plug(dispatch to hardware queue) or task gets scheduling. > > And I assume you're testing raw disk, because filesystems should > always enable plug. > Yes. FIO random IO tests over raw disks. > Yes, perf data will be helpful. And please show your test in details > and I'll > check if I can reproduce it. System config: 47 dedicate cores 120 GB memory PCIe4 2-Port 64Gb FC Adapter 64Gb FC switch FlashSystem: FS9500, 12 LUNs/FC port Below is fio config for rwmixread=100: [global] randrepeat=0 buffered=0 direct=1 norandommap=1 group_reporting=1 size=80g ioengine=libaio rw=randrw bs=4k iodepth=1 rwmixread=100 runtime=600 ramp_time=5 time_based=1 numjobs=20 [job1] filename=/dev/dm-2 [job2] filename=/dev/dm-3 ... 24 jobs in total. We collected some perf data. What kind of perf data you want? Let me know. Thanks, Wendy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Observing higher CPU utilization during random IO fio testing 2026-05-25 5:28 ` Yu Kuai 2026-05-26 15:28 ` Wen Xiong @ 2026-05-29 17:13 ` Wen Xiong 2026-05-31 11:45 ` Yu Kuai 1 sibling, 1 reply; 8+ messages in thread From: Wen Xiong @ 2026-05-29 17:13 UTC (permalink / raw) To: yukuai; +Cc: Jens Axboe, linux-block, tom.leiming, jmoyer, Gjoyce, wenxiong On 2026-05-25 00:28, Yu Kuai wrote: > 在 2026/5/22 5:52, Jens Axboe 写道: > Yes, perf data will be helpful. And please show your test in details > and I'll > check if I can reproduce it. Hi Yu Kuai, Have you reproduced the issue yet? Below is some perf data we took while running random read test: Test: FIO random read with qdepth=1 nj=20, we saw higher CPU utilization in this testcase. Perf record: start fio run on one session and kickoff the script in another session while test is running Perf report: With blk_start_plug/blk_finish_plug before calling __submit_bio() in blk-core.c: Top.txt 2.41% fio [kernel.kallsyms] [k] cpupri_set 1.16% fio [kernel.kallsyms] [k] queued_spin_lock_slowpath 0.75% fio [kernel.kallsyms] [k] sbitmap_find_bit 0.47% fio [kernel.kallsyms] [k] set_next_task_rt 0.41% fio [kernel.kallsyms] [k] pull_rt_task 0.34% fio [kernel.kallsyms] [k] enqueue_pushable_task … 0.02% fio [kernel.kallsyms] [k] __blk_flush_plug 0.01% fio [kernel.kallsyms] [k] blk_add_rq_to_plug 0.01% fio [kernel.kallsyms] [k] blk_mq_flush_plug_list 0.00% fio [kernel.kallsyms] [k] blk_attempt_plug_merge Callgraph.txt 2.41% fio [kernel.kallsyms] [k] cpupri_set | ---cpupri_set | |--1.15%--__enqueue_rt_entity | enqueue_task_rt | enqueue_task | ttwu_do_activate Perf report Without blk_start_plug and blk_finish_plug before calling __submit_bio(): Top.txt 0.67% fio [kernel.kallsyms] [k] queued_spin_lock_slowpath 0.64% fio [kernel.kallsyms] [k] sched_balance_newidle 0.47% fio [kernel.kallsyms] [k] _raw_spin_lock 0.39% fio [kernel.kallsyms] [k] sbitmap_find_bit 0.35% fio [kernel.kallsyms] [k] cpupri_set 0.28% fio [kernel.kallsyms] [k] work_grab_pending 0.24% fio [kernel.kallsyms] [k] lookup_ioctx 0.23% fio [kernel.kallsyms] [k] __schedule … … 0.00% fio [kernel.kallsyms] [k] blk_attempt_plug_merge Call graph.txt: 0.35% fio [kernel.kallsyms] [k] cpupri_set | ---cpupri_set | |--0.17%--arch_local_irq_restore.part.0 | | | |--0.14%--finish_task_switch.isra.0 | | __schedule | | | | | |--0.13%--schedule | | | | | | | |--0.07%--read_events ….. |--0.13%--__enqueue_rt_entity | enqueue_task_rt | enqueue_task | ttwu_do_activate From above perf data, looks like 1. High time spent in cpupri_set(): tasks being enqueued/dequeued frequently, more IO scheduling. 2. Call more plug routines. If you need full perf data report, I can email/attach your full report. Thanks for your help! Wendy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Observing higher CPU utilization during random IO fio testing 2026-05-29 17:13 ` Wen Xiong @ 2026-05-31 11:45 ` Yu Kuai 0 siblings, 0 replies; 8+ messages in thread From: Yu Kuai @ 2026-05-31 11:45 UTC (permalink / raw) To: Wen Xiong Cc: Jens Axboe, linux-block, tom.leiming, jmoyer, Gjoyce, wenxiong, yukuai Hi, 在 2026/5/30 1:13, Wen Xiong 写道: > On 2026-05-25 00:28, Yu Kuai wrote: > >> 在 2026/5/22 5:52, Jens Axboe 写道: >> Yes, perf data will be helpful. And please show your test in details >> and I'll >> check if I can reproduce it. > > Hi Yu Kuai, > Have you reproduced the issue yet? I don't have exact the same result, but yes I think I can reproduce it. And consider this is raw disk with qd=1 test, unless some blkio qos policy is enabled and extra ktime_get() can be optimized, It's expected there will be extra cpu overhead. > > Below is some perf data we took while running random read test: > > Test: > FIO random read with qdepth=1 nj=20, we saw higher CPU utilization in > this testcase. > > Perf record: > start fio run on one session and kickoff the script in another session > while test is running > > Perf report: > With blk_start_plug/blk_finish_plug before calling __submit_bio() in > blk-core.c: > Top.txt > 2.41% fio [kernel.kallsyms] [k] > cpupri_set > 1.16% fio [kernel.kallsyms] [k] > queued_spin_lock_slowpath > 0.75% fio [kernel.kallsyms] [k] > sbitmap_find_bit > 0.47% fio [kernel.kallsyms] [k] > set_next_task_rt > 0.41% fio [kernel.kallsyms] [k] > pull_rt_task > 0.34% fio [kernel.kallsyms] [k] > enqueue_pushable_task > … > 0.02% fio [kernel.kallsyms] [k] > __blk_flush_plug > 0.01% fio [kernel.kallsyms] [k] > blk_add_rq_to_plug > 0.01% fio [kernel.kallsyms] [k] > blk_mq_flush_plug_list > 0.00% fio [kernel.kallsyms] [k] > blk_attempt_plug_merge > > Callgraph.txt > > 2.41% fio [kernel.kallsyms] [k] > cpupri_set > | > ---cpupri_set > | > |--1.15%--__enqueue_rt_entity > | enqueue_task_rt > | enqueue_task > | ttwu_do_activate > > > Perf report > Without blk_start_plug and blk_finish_plug before calling > __submit_bio(): > Top.txt > 0.67% fio [kernel.kallsyms] [k] > queued_spin_lock_slowpath > 0.64% fio [kernel.kallsyms] [k] > sched_balance_newidle > 0.47% fio [kernel.kallsyms] [k] > _raw_spin_lock > 0.39% fio [kernel.kallsyms] [k] > sbitmap_find_bit > 0.35% fio [kernel.kallsyms] [k] > cpupri_set > 0.28% fio [kernel.kallsyms] [k] > work_grab_pending > 0.24% fio [kernel.kallsyms] [k] > lookup_ioctx > 0.23% fio [kernel.kallsyms] [k] > __schedule > … > … > 0.00% fio [kernel.kallsyms] [k] > blk_attempt_plug_merge > > Call graph.txt: > > 0.35% fio [kernel.kallsyms] [k] cpupri_set > | > ---cpupri_set > | > |--0.17%--arch_local_irq_restore.part.0 > | | > | |--0.14%--finish_task_switch.isra.0 > | | __schedule > | | | > | | |--0.13%--schedule > | | | | > | | | |--0.07%--read_events > ….. > |--0.13%--__enqueue_rt_entity > | enqueue_task_rt > | enqueue_task > | ttwu_do_activate > > From above perf data, looks like > 1. High time spent in cpupri_set(): tasks being enqueued/dequeued > frequently, more IO scheduling. > 2. Call more plug routines. > > If you need full perf data report, I can email/attach your full report. I think this is a corner case for qd=1 raw disk test, I'm fine to revert the commit to fix this problem if needed. > > Thanks for your help! > Wendy -- Thansk, Kuai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Observing higher CPU utilization during random IO fio testing 2026-05-21 19:44 Observing higher CPU utilization during random IO fio testing Wen Xiong 2026-05-21 21:52 ` Jens Axboe @ 2026-05-30 1:10 ` Ming Lei 2026-05-31 11:56 ` Yu Kuai 1 sibling, 1 reply; 8+ messages in thread From: Ming Lei @ 2026-05-30 1:10 UTC (permalink / raw) To: Wen Xiong; +Cc: linux-block, axboe, jmoyer, Gjoyce, wenxiong On Thu, May 21, 2026 at 02:44:22PM -0500, Wen Xiong wrote: > Hi All, > > Our performance team observed the higher CPU utilization in RHEL10 compared > to RHEL9.8, observed the similar issue in upstream kernel(v7.1-rc4) as well > when running FIO random IO tests. > > System configuration: > 47 dedicate cores > 120 GB memory > PCIe4 2-Port 64Gb FC Adapter > FlashSystem: FS9500, 12 LUNs/FC port, 100G each LUN. > > Random IO tests are more CPU intensive than sequential IO tests due to > several factors: more context switching, Interrupt Handling, cache > Inefficiency etc. We found out the following patch which caused the higher > CPU utilization in rhel10 and newer linux kernel: > > commit 060406c61c7cb4bbd82a02d179decca9c9bb3443 (HEAD) > Author: Yu Kuai <yukuai3@huawei.com> > Date: Thu May 9 20:38:25 2024 +0800 > > block: add plug while submitting IO > > So that if caller didn't use plug, for example, __blkdev_direct_IO_simple() > and __blkdev_direct_IO_async(), block layer can still benefit from caching > nsec time in the plug. > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > Link: > https://lore.kernel.org/r/20240509123825.3225207-1-yukuai1@huaweicloud.com > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > We reverted above patch in rhel10 kernel and upstream 7.1-rc4, saw lower CPU > utilization when doing the same FIO test. > > The patch adds plugging in __submit_bio() in block layer, maybe cause > performance degradation: > - Random IO tests have less merging, flush overhead. > - More IO scheduler interaction, forces requests through scheduler instead > of direct dispatch(direct dispatch to hardware queue) > - Poor cache locality during plug operation Yes, it is expected to see regression on QD=1 workload. Adding inner plug for caching timestamp only is not good from plug function viewpoint, because only the outer code path(io_uring, libaio, ...) knows exact IO batch size and can decide if plug should be used. Given 060406c61c7c ("block: add plug while submitting IO") doesn't provide any performance data, maybe it can be reverted. I am wondering why not move the timestamp cache into 'task_struct' and get wider users? Thanks, Ming ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Observing higher CPU utilization during random IO fio testing 2026-05-30 1:10 ` Ming Lei @ 2026-05-31 11:56 ` Yu Kuai 0 siblings, 0 replies; 8+ messages in thread From: Yu Kuai @ 2026-05-31 11:56 UTC (permalink / raw) To: Ming Lei, Wen Xiong; +Cc: linux-block, axboe, jmoyer, Gjoyce, wenxiong, yukuai Hi, 在 2026/5/30 9:10, Ming Lei 写道: > On Thu, May 21, 2026 at 02:44:22PM -0500, Wen Xiong wrote: >> Hi All, >> >> Our performance team observed the higher CPU utilization in RHEL10 compared >> to RHEL9.8, observed the similar issue in upstream kernel(v7.1-rc4) as well >> when running FIO random IO tests. >> >> System configuration: >> 47 dedicate cores >> 120 GB memory >> PCIe4 2-Port 64Gb FC Adapter >> FlashSystem: FS9500, 12 LUNs/FC port, 100G each LUN. >> >> Random IO tests are more CPU intensive than sequential IO tests due to >> several factors: more context switching, Interrupt Handling, cache >> Inefficiency etc. We found out the following patch which caused the higher >> CPU utilization in rhel10 and newer linux kernel: >> >> commit 060406c61c7cb4bbd82a02d179decca9c9bb3443 (HEAD) >> Author: Yu Kuai <yukuai3@huawei.com> >> Date: Thu May 9 20:38:25 2024 +0800 >> >> block: add plug while submitting IO >> >> So that if caller didn't use plug, for example, __blkdev_direct_IO_simple() >> and __blkdev_direct_IO_async(), block layer can still benefit from caching >> nsec time in the plug. >> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> Link: >> https://lore.kernel.org/r/20240509123825.3225207-1-yukuai1@huaweicloud.com >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> We reverted above patch in rhel10 kernel and upstream 7.1-rc4, saw lower CPU >> utilization when doing the same FIO test. >> >> The patch adds plugging in __submit_bio() in block layer, maybe cause >> performance degradation: >> - Random IO tests have less merging, flush overhead. >> - More IO scheduler interaction, forces requests through scheduler instead >> of direct dispatch(direct dispatch to hardware queue) >> - Poor cache locality during plug operation > Yes, it is expected to see regression on QD=1 workload. > > Adding inner plug for caching timestamp only is not good from plug function viewpoint, > because only the outer code path(io_uring, libaio, ...) knows exact IO batch size > and can decide if plug should be used. > > Given 060406c61c7c ("block: add plug while submitting IO") doesn't provide > any performance data, maybe it can be reverted. > > I am wondering why not move the timestamp cache into 'task_struct' and get wider users? Yes, this is exactly what we did in downstream kernels, the time is cached in task_struct and IO completion also use it. And this is probably why we don't see this regression. > > > Thanks, > Ming > -- Thansk, Kuai ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-31 11:56 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-21 19:44 Observing higher CPU utilization during random IO fio testing Wen Xiong 2026-05-21 21:52 ` Jens Axboe 2026-05-25 5:28 ` Yu Kuai 2026-05-26 15:28 ` Wen Xiong 2026-05-29 17:13 ` Wen Xiong 2026-05-31 11:45 ` Yu Kuai 2026-05-30 1:10 ` Ming Lei 2026-05-31 11:56 ` Yu Kuai
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.