* [PATCH v2] block: invalidate cached plug timestamp after task switch
@ 2026-06-12 9:40 Usama Arif
2026-06-12 9:45 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Usama Arif @ 2026-06-12 9:40 UTC (permalink / raw)
To: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, peterz, rostedt,
vincent.guittot, vschneid
Cc: shakeel.butt, hannes, riel, kernel-team, Usama Arif, stable
blk_time_get_ns() caches ktime_get_ns() in current->plug->cur_ktime
and marks the task with PF_BLOCK_TS. That cache is only valid while the
task keeps running; if the task is switched out, wall-clock time
advances and the cached value must not be reused when the task runs again.
The existing invalidation covers explicit plug flushes through
__blk_flush_plug(), and the schedule() / rtmutex paths through
sched_update_worker(). It does not cover in-kernel preemption paths such
as preempt_schedule(), preempt_schedule_notrace(), and
preempt_schedule_irq(), which enter __schedule(SM_PREEMPT) directly and
return without calling sched_update_worker().
As a result, a task preempted while holding a plug with PF_BLOCK_TS set
can reuse a stale plug->cur_ktime after it is scheduled back in. blk-iocost
then consumes that stale timestamp through ioc_now(), producing stale vnow
values for throttle decisions, and through ioc_rqos_done(), inflating
on-queue time and feeding false missed-QoS samples into vrate
adjustment.
Move the schedule-side invalidation to finish_task_switch(), which runs
for the scheduled-in task after every actual context switch regardless
of which schedule entry point was used. Keep __blk_flush_plug() as the
explicit flush/finish-plug invalidation path, and remove only the
PF_BLOCK_TS handling from sched_update_worker().
Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Cc: stable@vger.kernel.org
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
v1 -> v2: https://lore.kernel.org/all/20260611231428.345098-1-usama.arif@linux.dev/
- Make the function just blk_plug_invalidate_ts(), move the check for
PF_BLOCK_TS flag into blk_plug_invalidate_ts and make it __always_inline
(Peter Zijlstra).
---
include/linux/blkdev.h | 17 ++++++++---------
kernel/sched/core.c | 12 ++++++++----
2 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 57e84d59a642..1c1fd31ce187 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1216,16 +1216,15 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
__blk_flush_plug(plug, async);
}
-/*
- * tsk == current here
- */
-static inline void blk_plug_invalidate_ts(struct task_struct *tsk)
+static __always_inline void blk_plug_invalidate_ts(void)
{
- struct blk_plug *plug = tsk->plug;
+ if (unlikely(current->flags & PF_BLOCK_TS)) {
+ struct blk_plug *plug = current->plug;
- if (plug)
- plug->cur_ktime = 0;
- current->flags &= ~PF_BLOCK_TS;
+ if (plug)
+ plug->cur_ktime = 0;
+ current->flags &= ~PF_BLOCK_TS;
+ }
}
int blkdev_issue_flush(struct block_device *bdev);
@@ -1251,7 +1250,7 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
{
}
-static inline void blk_plug_invalidate_ts(struct task_struct *tsk)
+static inline void blk_plug_invalidate_ts(void)
{
}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b791e9e9f67..e97e98c33be5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5368,6 +5368,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
*/
kmap_local_sched_in();
+ /*
+ * Any cached block-layer timestamp (plug->cur_ktime) is stale now,
+ * invalidate it.
+ */
+ blk_plug_invalidate_ts();
+
fire_sched_in_preempt_notifiers(current);
/*
* When switching through a kernel thread, the loop in
@@ -7290,12 +7296,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
static void sched_update_worker(struct task_struct *tsk)
{
- if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER | PF_BLOCK_TS)) {
- if (tsk->flags & PF_BLOCK_TS)
- blk_plug_invalidate_ts(tsk);
+ if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) {
if (tsk->flags & PF_WQ_WORKER)
wq_worker_running(tsk);
- else if (tsk->flags & PF_IO_WORKER)
+ else
io_wq_worker_running(tsk);
}
}
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] block: invalidate cached plug timestamp after task switch
2026-06-12 9:40 [PATCH v2] block: invalidate cached plug timestamp after task switch Usama Arif
@ 2026-06-12 9:45 ` Peter Zijlstra
2026-06-12 10:02 ` Usama Arif
0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2026-06-12 9:45 UTC (permalink / raw)
To: Usama Arif
Cc: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, rostedt,
vincent.guittot, vschneid, shakeel.butt, hannes, riel,
kernel-team, stable
On Fri, Jun 12, 2026 at 02:40:42AM -0700, Usama Arif wrote:
> +static __always_inline void blk_plug_invalidate_ts(void)
> {
> + if (unlikely(current->flags & PF_BLOCK_TS)) {
> + struct blk_plug *plug = current->plug;
>
> + if (plug)
> + plug->cur_ktime = 0;
> + current->flags &= ~PF_BLOCK_TS;
> + }
> }
If you can guarantee PF_BLOCK_TS is only ever set when current->plug,
this can be reduced further.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] block: invalidate cached plug timestamp after task switch
2026-06-12 9:45 ` Peter Zijlstra
@ 2026-06-12 10:02 ` Usama Arif
2026-06-12 15:40 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Usama Arif @ 2026-06-12 10:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, rostedt,
vincent.guittot, vschneid, shakeel.butt, hannes, riel,
kernel-team, stable
On 12/06/2026 10:45, Peter Zijlstra wrote:
> On Fri, Jun 12, 2026 at 02:40:42AM -0700, Usama Arif wrote:
>
>> +static __always_inline void blk_plug_invalidate_ts(void)
>> {
>> + if (unlikely(current->flags & PF_BLOCK_TS)) {
>> + struct blk_plug *plug = current->plug;
>>
>> + if (plug)
>> + plug->cur_ktime = 0;
>> + current->flags &= ~PF_BLOCK_TS;
>> + }
>> }
>
> If you can guarantee PF_BLOCK_TS is only ever set when current->plug,
> this can be reduced further.
Thanks for the reviews!
The invariant holds at set time (the only set in blk_time_get_ns() is
gated by if (!plug)) and through the only legitimate plug clear in
blk_finish_plug() (which goes through __blk_flush_plug() that clears
PF_BLOCK_TS first).
However, copy_process() sets p->plug = NULL for the child but doesn't
strip PF_BLOCK_TS from the inherited flags.
I think the if(plug) is a good defensive check, but can also do the below
if you prefer?
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1c1fd31ce187..c285a4d9837d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1219,10 +1219,7 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
static __always_inline void blk_plug_invalidate_ts(void)
{
if (unlikely(current->flags & PF_BLOCK_TS)) {
- struct blk_plug *plug = current->plug;
-
- if (plug)
- plug->cur_ktime = 0;
+ current->plug->cur_ktime = 0;
current->flags &= ~PF_BLOCK_TS;
}
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 892a95214c54..9a062149e0d8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2167,7 +2167,8 @@ __latent_entropy struct task_struct *copy_process(
goto bad_fork_cleanup_count;
delayacct_tsk_init(p); /* Must remain after dup_task_struct() */
- p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE | PF_NO_SETAFFINITY);
+ p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE | PF_NO_SETAFFINITY |
+ PF_BLOCK_TS);
p->flags |= PF_FORKNOEXEC;
INIT_LIST_HEAD(&p->children);
INIT_LIST_HEAD(&p->sibling);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] block: invalidate cached plug timestamp after task switch
2026-06-12 10:02 ` Usama Arif
@ 2026-06-12 15:40 ` Peter Zijlstra
2026-06-12 15:47 ` Usama Arif
0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2026-06-12 15:40 UTC (permalink / raw)
To: Usama Arif
Cc: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, rostedt,
vincent.guittot, vschneid, shakeel.butt, hannes, riel,
kernel-team, stable
On Fri, Jun 12, 2026 at 11:02:58AM +0100, Usama Arif wrote:
>
>
> On 12/06/2026 10:45, Peter Zijlstra wrote:
> > On Fri, Jun 12, 2026 at 02:40:42AM -0700, Usama Arif wrote:
> >
> >> +static __always_inline void blk_plug_invalidate_ts(void)
> >> {
> >> + if (unlikely(current->flags & PF_BLOCK_TS)) {
> >> + struct blk_plug *plug = current->plug;
> >>
> >> + if (plug)
> >> + plug->cur_ktime = 0;
> >> + current->flags &= ~PF_BLOCK_TS;
> >> + }
> >> }
> >
> > If you can guarantee PF_BLOCK_TS is only ever set when current->plug,
> > this can be reduced further.
>
> Thanks for the reviews!
>
> The invariant holds at set time (the only set in blk_time_get_ns() is
> gated by if (!plug)) and through the only legitimate plug clear in
> blk_finish_plug() (which goes through __blk_flush_plug() that clears
> PF_BLOCK_TS first).
>
> However, copy_process() sets p->plug = NULL for the child but doesn't
> strip PF_BLOCK_TS from the inherited flags.
>
> I think the if(plug) is a good defensive check, but can also do the below
> if you prefer?
I think that's worth the extra few lines.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] block: invalidate cached plug timestamp after task switch
2026-06-12 15:40 ` Peter Zijlstra
@ 2026-06-12 15:47 ` Usama Arif
0 siblings, 0 replies; 5+ messages in thread
From: Usama Arif @ 2026-06-12 15:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, rostedt,
vincent.guittot, vschneid, shakeel.butt, hannes, riel,
kernel-team, stable
On 12/06/2026 16:40, Peter Zijlstra wrote:
> On Fri, Jun 12, 2026 at 11:02:58AM +0100, Usama Arif wrote:
>>
>>
>> On 12/06/2026 10:45, Peter Zijlstra wrote:
>>> On Fri, Jun 12, 2026 at 02:40:42AM -0700, Usama Arif wrote:
>>>
>>>> +static __always_inline void blk_plug_invalidate_ts(void)
>>>> {
>>>> + if (unlikely(current->flags & PF_BLOCK_TS)) {
>>>> + struct blk_plug *plug = current->plug;
>>>>
>>>> + if (plug)
>>>> + plug->cur_ktime = 0;
>>>> + current->flags &= ~PF_BLOCK_TS;
>>>> + }
>>>> }
>>>
>>> If you can guarantee PF_BLOCK_TS is only ever set when current->plug,
>>> this can be reduced further.
>>
>> Thanks for the reviews!
>>
>> The invariant holds at set time (the only set in blk_time_get_ns() is
>> gated by if (!plug)) and through the only legitimate plug clear in
>> blk_finish_plug() (which goes through __blk_flush_plug() that clears
>> PF_BLOCK_TS first).
>>
>> However, copy_process() sets p->plug = NULL for the child but doesn't
>> strip PF_BLOCK_TS from the inherited flags.
>>
>> I think the if(plug) is a good defensive check, but can also do the below
>> if you prefer?
>
> I think that's worth the extra few lines.
ah sorry didnt understand you completely, by extra lines do you mean keep
the if(plug) or get rid of it and clear the flag in fork?
I though more about it and I think its much nicer to get rid of the if(plug)
and clear the flag in fork.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-12 15:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 9:40 [PATCH v2] block: invalidate cached plug timestamp after task switch Usama Arif
2026-06-12 9:45 ` Peter Zijlstra
2026-06-12 10:02 ` Usama Arif
2026-06-12 15:40 ` Peter Zijlstra
2026-06-12 15:47 ` Usama Arif
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.