From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 521FA410D1E for ; Sat, 28 Feb 2026 10:02:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.67 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772272931; cv=none; b=bu9s6vm1f4BKoZFXQM/XtVp4QLhCLXcir8TYSFmbB22eKPkpXLq+BUjw1dsopy7+l3v9/RzNiriiPODTannzHfgbqhFoWUePy03ZOro5B8LOuw6FqE0+7vjiOf7YJ0/K7FcPGnKegq1ro9VuV5c6U/9uKHOI1y04jPr8YZUpvSU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772272931; c=relaxed/simple; bh=4Aef9vSXCWP2QZRbxrzJQ98mULLXqxmIPAzpcYcNBNA=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=kv/oItd4wkNJbzlpUjWkcKTmNXh1uVFtGmSMLx2sOZdPbaJ7CwhRTRaDdgFZUO9zd6aW9JqjDYc6eR1LDTe1nksk9GRaM/yZq6+kjQT4otz6OKjnFDTx+uH//QnNCVWIIQMdt+ujwYgyrVvc7iDRK6m/xis2smZJ684ZtCoKbqg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Eop+2DC5; arc=none smtp.client-ip=209.85.216.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Eop+2DC5" Received: by mail-pj1-f67.google.com with SMTP id 98e67ed59e1d1-35928defcd0so1706480a91.0 for ; Sat, 28 Feb 2026 02:02:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772272928; x=1772877728; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=NemizFckd1NGmEB8vEadoCu6WBOYRc9Jjk19xx9+dh4=; b=Eop+2DC5aK+lAGm5knSnz+tM2dSDH2rVF+v+JQHYwE0LfX0RnUadlD0lpKgBglcDIM n5a5U9t6HD+t0Vv9Q89CculJHUKMv7scwK2WT+zLxT7UkIu+A7OLQ3Z3hxujf3CTNFr9 OStwx/hAXK7FhJNJ3jh/KYq9SXEO4J8y4G4cGbd98v7kqi01mmtp1HI7FUlXsgwLLsvV jFbltW1+OcpNqVSxlXhoL01BuhRANVzCHSBePi+mTR/pN8RZLc+lAVAmlbScqPxnQi8T jaogXA4xUenczV1Xup/OODKI5tc7BGwzOEp7CzN14S2jIVVrwHoY7DPe3QrZ74qDeHgp heVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772272928; x=1772877728; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NemizFckd1NGmEB8vEadoCu6WBOYRc9Jjk19xx9+dh4=; b=Tn3j9+RyPMToKM/SmkZd9zGvUwBYhevt751FahN/TnsFf59u/S6oW1ZlBP+SgVKJYC un+o+DPvWB+Ww4g5FthqfcKvIPoqRZwHBPmaITECU6MQ2E/WsXA0J7XeOLyLTqz4v2hz OheJmdJ/5O2ueNFJDzNQHixQ3W7/+t8JkdQ1NZ9hG1NWBXjlhpgfUGB7FdjG03LP7zLK GoTTPMpAQev/rOwSa7Jty6ec5ydLriLWNRPqZzfdV38XG1cZQ7CXpYhsDOzlPmXjm1re BGpWplBAVgYB5bESBRMA6TtUXAdgruo0xit1Ok2spBgZ2xVQAOGdnqMNZMyxCY+13Y/s 1NlA== X-Forwarded-Encrypted: i=1; AJvYcCX1n98eILhXghhNqkihU2Myieo4+b310pI+o8GtyBd4djaIqfuayLleAciaGuXTk70bWXKAdPDPNelkh2A=@vger.kernel.org X-Gm-Message-State: AOJu0Yzs7cTCFcP7L4Y5aIjm7d7PF7MtMFXHkZQSPxwhNyy2pTbOmMxg zEip8WoLTRs874p/JjwPfx9056ZZDpiafvNtEXOVjXt7M/BVq9ToBb0L X-Gm-Gg: ATEYQzyIhEx4qR99m3YhKj6Z31Vz0g2zJOt8aNQZt2+iFqJ5mM1AHKFm7qDac0j43U2 SXvcZnQ04wM916EldzkRcw1/zSFawl9SEb2PLyC4FUxJae/8nL623mocG9yEPRIPUmMJZzRX0oY 7OEU8Zd51Vi+G+2xZndrO+o4X3g67Him+DC7MT0jsHnRd97R8HGdem6nfGbOycJq65PxRoeY3Yp 4yVfga5OGRjwvFGd0mqx9wv2rHgGFVYyh4Yz29ksUc3Qh3f8Uk6hl9eZ6nNB1k6DOt4uytW7QN1 lt7Dh8lKra/QhUKzJksiNpG11EyPWpjyr47TAS/Kg9faGxjs01SzHLkyWfwTl+sVIPEpD9+aZEH Y/gPUaFRRF6oFnOVs2XOS+9Tj2K+TfLmd0zcPsFnH4wp8v4CxHHWEDaSATyeCj5uKNu0ephleWT NgDOtV3S89n2RlkHMOM6eD16p+0t2JhvYuCw== X-Received: by 2002:a17:90b:5747:b0:354:a065:ec3e with SMTP id 98e67ed59e1d1-35965cd0f50mr5149309a91.26.1772272928282; Sat, 28 Feb 2026 02:02:08 -0800 (PST) Received: from archwsl.localdomain ([117.184.79.158]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-3593db82627sm7780278a91.0.2026.02.28.02.02.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 Feb 2026 02:02:07 -0800 (PST) From: Jialin Wang To: axboe@kernel.dk, yukuai@fnnas.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, lianux.mm@gmail.com, lenohou@gmail.com, Jialin Wang Subject: [PATCH] block: fix race in update_io_ticks causing inflated disk statistics Date: Sat, 28 Feb 2026 18:01:44 +0800 Message-ID: <20260228100144.254436-1-wjl.linux@gmail.com> X-Mailer: git-send-email 2.52.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit When multiple threads issue I/O requests concurrently after a period of disk idle time, iostat can report abnormal %util spikes (100%+) even when the actual I/O load is extremely light. This issue can be reproduced using fio. By binding 8 fio threads to different CPUs, and having them issue 4KB I/Os every 1 second: fio --name=test --ioengine=sync --rw=randwrite --direct=1 --bs=4k \ --numjobs=8 --cpus_allowed=0-7 --cpus_allowed_policy=split \ --thinktime=1s --time_based --runtime=60 --group_reporting \ --filename=/mnt/sdb/test The iostat -d sda 1 output will show a false 100%+ %util randomly: Device ... w/s wkB/s wrqm/s %wrqm w_await wareq-sz ... aqu-sz %util sdb ... 16.00 104.00 0.00 0.00 1.25 6.50 ... 0.02 0.90 Device ... w/s wkB/s wrqm/s %wrqm w_await wareq-sz ... aqu-sz %util sdb ... 8.00 32.00 0.00 0.00 1.38 4.00 ... 0.01 100.30 Device ... w/s wkB/s wrqm/s %wrqm w_await wareq-sz ... aqu-sz %util sdb ... 8.00 32.00 0.00 0.00 1.38 4.00 ... 0.01 0.20 Device ... w/s wkB/s wrqm/s %wrqm w_await wareq-sz ... aqu-sz %util sdb ... 11.00 44.00 0.00 0.00 1.27 4.00 ... 0.01 82.80 The root cause is a race condition in update_io_ticks(). When the disk has been idle for a while (e.g., 1 second), part->bd_stamp holds an old timestamp. If CPU A and CPU B start I/O at the exact same time: 1. Both CPUs read the same old 'stamp' and pass the time_after() check. 2. CPU A executes try_cmpxchg() successfully. 3. CPU B fails try_cmpxchg(), exits update_io_ticks(), and immediately increments its local in_flight counter via part_stat_local_inc(). 4. CPU A continues to evaluate the 'busy' condition: end || bdev_count_inflight(part). 5. Since it is an I/O start, 'end' is false, so CPU A calls bdev_count_inflight() to check. 6. However, bdev_count_inflight() iterates over all CPUs and sees CPU B's newly incremented in_flight count. It returns true. 7. CPU A incorrectly assumes the disk was busy during the entire 'now - stamp' window (the 1-second idle period) and adds this large delta to io_ticks. To fix this, we capture the 'busy' state before performing the try_cmpxchg(). By taking a snapshot of whether the device is active prior to updating bd_stamp, we prevent CPU A from being misled by concurrent I/O submissions from other CPUs that occur after the timestamp comparison but before the inflight check. Fixes: 99dc422335d8 ("block: support to account io_ticks precisely") Signed-off-by: Jialin Wang --- block/blk-core.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 474700ffaa1c..1481daf1e664 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1026,10 +1026,11 @@ void update_io_ticks(struct block_device *part, unsigned long now, bool end) unsigned long stamp; again: stamp = READ_ONCE(part->bd_stamp); - if (unlikely(time_after(now, stamp)) && - likely(try_cmpxchg(&part->bd_stamp, &stamp, now)) && - (end || bdev_count_inflight(part))) - __part_stat_add(part, io_ticks, now - stamp); + if (unlikely(time_after(now, stamp))) { + bool busy = end || bdev_count_inflight(part); + if (likely(try_cmpxchg(&part->bd_stamp, &stamp, now)) && busy) + __part_stat_add(part, io_ticks, now - stamp); + } if (bdev_is_partition(part)) { part = bdev_whole(part); -- 2.52.0