From: Usama Arif <usama.arif@linux.dev>
To: axboe@kernel.dk, linux-block@vger.kernel.org, bsegall@google.com,
dietmar.eggemann@arm.com, juri.lelli@redhat.com,
kprateek.nayak@amd.com, linux-kernel@vger.kernel.org,
mgorman@suse.de, mingo@redhat.com, peterz@infradead.org,
rostedt@goodmis.org, vincent.guittot@linaro.org,
vschneid@redhat.com
Cc: shakeel.butt@linux.dev, hannes@cmpxchg.org, riel@surriel.com,
kernel-team@meta.com, Usama Arif <usama.arif@linux.dev>,
stable@vger.kernel.org
Subject: [PATCH v2] block: invalidate cached plug timestamp after task switch
Date: Fri, 12 Jun 2026 02:40:42 -0700 [thread overview]
Message-ID: <20260612094042.3350401-1-usama.arif@linux.dev> (raw)
blk_time_get_ns() caches ktime_get_ns() in current->plug->cur_ktime
and marks the task with PF_BLOCK_TS. That cache is only valid while the
task keeps running; if the task is switched out, wall-clock time
advances and the cached value must not be reused when the task runs again.
The existing invalidation covers explicit plug flushes through
__blk_flush_plug(), and the schedule() / rtmutex paths through
sched_update_worker(). It does not cover in-kernel preemption paths such
as preempt_schedule(), preempt_schedule_notrace(), and
preempt_schedule_irq(), which enter __schedule(SM_PREEMPT) directly and
return without calling sched_update_worker().
As a result, a task preempted while holding a plug with PF_BLOCK_TS set
can reuse a stale plug->cur_ktime after it is scheduled back in. blk-iocost
then consumes that stale timestamp through ioc_now(), producing stale vnow
values for throttle decisions, and through ioc_rqos_done(), inflating
on-queue time and feeding false missed-QoS samples into vrate
adjustment.
Move the schedule-side invalidation to finish_task_switch(), which runs
for the scheduled-in task after every actual context switch regardless
of which schedule entry point was used. Keep __blk_flush_plug() as the
explicit flush/finish-plug invalidation path, and remove only the
PF_BLOCK_TS handling from sched_update_worker().
Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Cc: stable@vger.kernel.org
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
v1 -> v2: https://lore.kernel.org/all/20260611231428.345098-1-usama.arif@linux.dev/
- Make the function just blk_plug_invalidate_ts(), move the check for
PF_BLOCK_TS flag into blk_plug_invalidate_ts and make it __always_inline
(Peter Zijlstra).
---
include/linux/blkdev.h | 17 ++++++++---------
kernel/sched/core.c | 12 ++++++++----
2 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 57e84d59a642..1c1fd31ce187 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1216,16 +1216,15 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
__blk_flush_plug(plug, async);
}
-/*
- * tsk == current here
- */
-static inline void blk_plug_invalidate_ts(struct task_struct *tsk)
+static __always_inline void blk_plug_invalidate_ts(void)
{
- struct blk_plug *plug = tsk->plug;
+ if (unlikely(current->flags & PF_BLOCK_TS)) {
+ struct blk_plug *plug = current->plug;
- if (plug)
- plug->cur_ktime = 0;
- current->flags &= ~PF_BLOCK_TS;
+ if (plug)
+ plug->cur_ktime = 0;
+ current->flags &= ~PF_BLOCK_TS;
+ }
}
int blkdev_issue_flush(struct block_device *bdev);
@@ -1251,7 +1250,7 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
{
}
-static inline void blk_plug_invalidate_ts(struct task_struct *tsk)
+static inline void blk_plug_invalidate_ts(void)
{
}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b791e9e9f67..e97e98c33be5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5368,6 +5368,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
*/
kmap_local_sched_in();
+ /*
+ * Any cached block-layer timestamp (plug->cur_ktime) is stale now,
+ * invalidate it.
+ */
+ blk_plug_invalidate_ts();
+
fire_sched_in_preempt_notifiers(current);
/*
* When switching through a kernel thread, the loop in
@@ -7290,12 +7296,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
static void sched_update_worker(struct task_struct *tsk)
{
- if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER | PF_BLOCK_TS)) {
- if (tsk->flags & PF_BLOCK_TS)
- blk_plug_invalidate_ts(tsk);
+ if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) {
if (tsk->flags & PF_WQ_WORKER)
wq_worker_running(tsk);
- else if (tsk->flags & PF_IO_WORKER)
+ else
io_wq_worker_running(tsk);
}
}
--
2.53.0-Meta
next reply other threads:[~2026-06-12 9:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 9:40 Usama Arif [this message]
2026-06-12 9:45 ` [PATCH v2] block: invalidate cached plug timestamp after task switch Peter Zijlstra
2026-06-12 10:02 ` Usama Arif
2026-06-12 15:40 ` Peter Zijlstra
2026-06-12 15:47 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612094042.3350401-1-usama.arif@linux.dev \
--to=usama.arif@linux.dev \
--cc=axboe@kernel.dk \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=stable@vger.kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.