public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Jialin Wang <wjl.linux@gmail.com>
To: axboe@kernel.dk, yukuai@fnnas.com
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	lianux.mm@gmail.com, lenohou@gmail.com,
	Jialin Wang <wjl.linux@gmail.com>
Subject: [PATCH] block: fix race in update_io_ticks causing inflated disk statistics
Date: Sat, 28 Feb 2026 18:01:44 +0800	[thread overview]
Message-ID: <20260228100144.254436-1-wjl.linux@gmail.com> (raw)

When multiple threads issue I/O requests concurrently after a period of
disk idle time, iostat can report abnormal %util spikes (100%+) even
when the actual I/O load is extremely light.

This issue can be reproduced using fio. By binding 8 fio threads to
different CPUs, and having them issue 4KB I/Os every 1 second:

  fio --name=test --ioengine=sync --rw=randwrite --direct=1 --bs=4k \
    --numjobs=8 --cpus_allowed=0-7 --cpus_allowed_policy=split \
    --thinktime=1s --time_based --runtime=60 --group_reporting \
    --filename=/mnt/sdb/test

The iostat -d sda 1 output will show a false 100%+ %util randomly:

Device  ...    w/s   wkB/s   wrqm/s  %wrqm w_await wareq-sz  ...  aqu-sz  %util
sdb     ...  16.00  104.00     0.00   0.00    1.25     6.50  ...    0.02   0.90

Device  ...    w/s   wkB/s   wrqm/s  %wrqm w_await wareq-sz  ...  aqu-sz  %util
sdb     ...   8.00   32.00     0.00   0.00    1.38     4.00  ...    0.01 100.30

Device  ...    w/s   wkB/s   wrqm/s  %wrqm w_await wareq-sz  ...  aqu-sz  %util
sdb     ...   8.00   32.00     0.00   0.00    1.38     4.00  ...    0.01   0.20

Device  ...    w/s   wkB/s   wrqm/s  %wrqm w_await wareq-sz  ...  aqu-sz  %util
sdb     ...  11.00   44.00     0.00   0.00    1.27     4.00  ...    0.01  82.80

The root cause is a race condition in update_io_ticks(). When the disk
has been idle for a while (e.g., 1 second), part->bd_stamp holds an
old timestamp. If CPU A and CPU B start I/O at the exact same time:

1. Both CPUs read the same old 'stamp' and pass the time_after() check.
2. CPU A executes try_cmpxchg() successfully.
3. CPU B fails try_cmpxchg(), exits update_io_ticks(), and immediately
   increments its local in_flight counter via part_stat_local_inc().
4. CPU A continues to evaluate the 'busy' condition:
   end || bdev_count_inflight(part).
5. Since it is an I/O start, 'end' is false, so CPU A calls
   bdev_count_inflight() to check.
6. However, bdev_count_inflight() iterates over all CPUs and sees CPU B's
   newly incremented in_flight count. It returns true.
7. CPU A incorrectly assumes the disk was busy during the entire
   'now - stamp' window (the 1-second idle period) and adds this large
   delta to io_ticks.

To fix this, we capture the 'busy' state before performing the
try_cmpxchg(). By taking a snapshot of whether the device is active
prior to updating bd_stamp, we prevent CPU A from being misled by
concurrent I/O submissions from other CPUs that occur after the
timestamp comparison but before the inflight check.

Fixes: 99dc422335d8 ("block: support to account io_ticks precisely")
Signed-off-by: Jialin Wang <wjl.linux@gmail.com>
---
 block/blk-core.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 474700ffaa1c..1481daf1e664 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1026,10 +1026,11 @@ void update_io_ticks(struct block_device *part, unsigned long now, bool end)
 	unsigned long stamp;
 again:
 	stamp = READ_ONCE(part->bd_stamp);
-	if (unlikely(time_after(now, stamp)) &&
-	    likely(try_cmpxchg(&part->bd_stamp, &stamp, now)) &&
-	    (end || bdev_count_inflight(part)))
-		__part_stat_add(part, io_ticks, now - stamp);
+	if (unlikely(time_after(now, stamp))) {
+		bool busy = end || bdev_count_inflight(part);
+		if (likely(try_cmpxchg(&part->bd_stamp, &stamp, now)) && busy)
+			__part_stat_add(part, io_ticks, now - stamp);
+	}
 
 	if (bdev_is_partition(part)) {
 		part = bdev_whole(part);
-- 
2.52.0


             reply	other threads:[~2026-02-28 10:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-28 10:01 Jialin Wang [this message]
2026-03-04 14:24 ` [PATCH] block: fix race in update_io_ticks causing inflated disk statistics Jialin Wang
2026-03-05 17:16 ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260228100144.254436-1-wjl.linux@gmail.com \
    --to=wjl.linux@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=lenohou@gmail.com \
    --cc=lianux.mm@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yukuai@fnnas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox