public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-iocost: do not lower busy_level when no IOs are completed
@ 2026-03-18 16:33 Jialin Wang
  0 siblings, 0 replies; only message in thread
From: Jialin Wang @ 2026-03-18 16:33 UTC (permalink / raw)
  To: tj, josef, axboe, yukuai; +Cc: cgroups, linux-block, linux-kernel, wjl.linux

In ioc_timer_fn(), iocost evaluates the device latency to adjust the
vrate. However, a logic error exists when no IOs are completed during
a timer period.

In such cases, ioc_lat_stat() returns missed_ppm and rq_wait_pct as zero.
The iocost incorrectly treats these zeros as a sign that the device
is meeting its QoS targets. As a result, busy_level is reset to 0. This
prevents busy_level from reaching the threshold needed to reduce vrate,
making iocost ineffective even when the device is severely overloaded.

This issue was observed while testing iocost on cloud disks. In normal
conditions, a 1MB IO has an average latency of ~6.5ms. After iocost
subtracts the size-based cost (size_nsec), the remaining latency is
~1.5ms. Based on this, rlat and wlat were set to 3000us. However, when
using fio with iodepth=128 to hit the cloud provider's BPS/IOPS limits,
the latency spiked to ~800ms as observed via iostat. Under this pressure,
it was common for some periods to record zero IO completions. This caused
busy_level to stay near zero, and vrate failed to scale down for over
10 seconds at a time.

Fix this by tracking the number of completed IOs (nr_done). The logic to
lower busy_level now requires nr_done > 0. If no IOs were completed, we
maintain the current busy_level because the latency is unknown.

Due to limited resources, I have only tested this patch on Azure
cloud disks. I am unsure of its impact on other types of hardware.
Testing on different storage devices would be highly appreciated.

Signed-off-by: Jialin Wang <wjl.linux@gmail.com>
---
 block/blk-iocost.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index d145db61e5c3..8b3bec6ea27e 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1596,7 +1596,8 @@ static enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p)
+static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p,
+			 u32 *nr_done)
 {
 	u32 nr_met[2] = { };
 	u32 nr_missed[2] = { };
@@ -1633,6 +1634,8 @@ static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p
 
 	*rq_wait_pct_p = div64_u64(rq_wait_ns * 100,
 				   ioc->period_us * NSEC_PER_USEC);
+
+	*nr_done = nr_met[READ] + nr_met[WRITE] + nr_missed[READ] + nr_missed[WRITE];
 }
 
 /* was iocg idle this period? */
@@ -2250,12 +2253,12 @@ static void ioc_timer_fn(struct timer_list *timer)
 	u64 usage_us_sum = 0;
 	u32 ppm_rthr;
 	u32 ppm_wthr;
-	u32 missed_ppm[2], rq_wait_pct;
+	u32 missed_ppm[2], rq_wait_pct, nr_done;
 	u64 period_vtime;
 	int prev_busy_level;
 
 	/* how were the latencies during the period? */
-	ioc_lat_stat(ioc, missed_ppm, &rq_wait_pct);
+	ioc_lat_stat(ioc, missed_ppm, &rq_wait_pct, &nr_done);
 
 	/* take care of active iocgs */
 	spin_lock_irq(&ioc->lock);
@@ -2403,7 +2406,8 @@ static void ioc_timer_fn(struct timer_list *timer)
 		/* clearly missing QoS targets, slow down vrate */
 		ioc->busy_level = max(ioc->busy_level, 0);
 		ioc->busy_level++;
-	} else if (rq_wait_pct <= RQ_WAIT_BUSY_PCT * UNBUSY_THR_PCT / 100 &&
+	} else if (nr_done &&
+		   rq_wait_pct <= RQ_WAIT_BUSY_PCT * UNBUSY_THR_PCT / 100 &&
 		   missed_ppm[READ] <= ppm_rthr * UNBUSY_THR_PCT / 100 &&
 		   missed_ppm[WRITE] <= ppm_wthr * UNBUSY_THR_PCT / 100) {
 		/* QoS targets are being met with >25% margin */
@@ -2429,7 +2433,7 @@ static void ioc_timer_fn(struct timer_list *timer)
 			 */
 			ioc->busy_level = 0;
 		}
-	} else {
+	} else if (nr_done) {
 		/* inside the hysterisis margin, we're good */
 		ioc->busy_level = 0;
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-03-18 16:33 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 16:33 [PATCH] blk-iocost: do not lower busy_level when no IOs are completed Jialin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox