All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy execution
@ 2025-11-17 18:55 K Prateek Nayak
  2025-11-17 18:55 ` [PATCH 1/5] sched/psi: Make psi stubs consistent for !CONFIG_PSI K Prateek Nayak
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: K Prateek Nayak @ 2025-11-17 18:55 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	John Stultz, Johannes Weiner, Suren Baghdasaryan, linux-kernel
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak

When booting into a kernel with CONFIG_SCHED_PROXY_EXEC and CONFIG_PSI,
a inconsistent task state warning was noticed soon after the boot
similar to:

    psi: inconsistent task state! task=... cpu=... psi_flags=4 clear=0 set=4

On analysis, the following sequence of event was found to be the cause
of the splat:

o Blocked task is retained on the runqueue.
o psi_sched_switch() sees task_on_rq_queued() and retains the runnable
  signals for the task.
o Tasks blocks later via proxy_deactivate() but psi_dequeue() doesn't
  adjust the PSI flags since DEQUEUE_SLEEP is set expecting
  psi_sched_switch() to fix the signals.
o The blocked task is woken up with the PSI state still reflecting that
  the task is runnable (TSK_RUNNING) leading to the splat.


Simply tracking proxy_deactivate() is not enough since the task's
blocked_on relationship can be cleared remotely without acquiring the
runqueue lock which can force a blocked task to run before a wakeup -
pick_next_task() pickes the blocked donor and since blocked on
relationship was cleared remotely, task_is_blocked() returns false
leading to the task being run on the CPU.

If the task blocks again before it is woken up, psi_sched_switch() will
try to clear the runnable signals (TSK_RUNNING) unconditionally leading
to a different splat similar to:

    psi: inconsistent task state! task=... cpu=... psi_flags=10 clear=14 set=0


To get around this, track the complete lifecycle of a blocked doner
right from delaying the deactivation to the wakeup. When in
blocked/donor state, PSI will consider these tasks similar to delayed
tasks - blocked but migratable.

When the ttwu_runnable() finally wakeups up the task, or if the donor is
deactivated via proxy_deactivate(), the proxy indicator is cleared to
show that the task is either fully blocked or fully runnable now.

Patch 1 and 2 were cleanups to make life slightly easier when auditing
the implementation and inspecting the debug logs. Patch 3 to 5 implement
the tracking of donor states and a couple of fixes on top.

Series was tested on top of tip:sched/core for a while running
sched-messaging without observing any inconsistent task state warning
and should apply cleanly on top of:

    git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core

at commit 33cf66d88306 ("sched/fair: Proportional newidle balance").

---
K Prateek Nayak (5):
  sched/psi: Make psi stubs consistent for !CONFIG_PSI
  sched/psi: Prepend "0x" to format specifiers when printing PSI flags
  sched/core: Track blocked tasks retained on rq for proxy
  sched/core: Block proxy task on pick when blocked_on is cleared before
    wakeup
  sched/psi: Fix PSI signals of blocked tasks retained for proxy

 include/linux/sched.h |  4 +++
 kernel/sched/core.c   | 59 +++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/psi.c    |  4 +--
 kernel/sched/sched.h  |  2 ++
 kernel/sched/stats.h  |  6 ++---
 5 files changed, 68 insertions(+), 7 deletions(-)


base-commit: 33cf66d88306663d16e4759e9d24766b0aaa2e17
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-12-02 14:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-17 18:55 [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy execution K Prateek Nayak
2025-11-17 18:55 ` [PATCH 1/5] sched/psi: Make psi stubs consistent for !CONFIG_PSI K Prateek Nayak
2025-11-18  1:06   ` John Stultz
2025-11-20  5:59   ` Madadi Vineeth Reddy
2025-11-20  6:10     ` K Prateek Nayak
2025-11-20  6:22       ` Madadi Vineeth Reddy
2025-12-02 14:32   ` Johannes Weiner
2025-11-17 18:55 ` [PATCH 2/5] sched/psi: Prepend "0x" to format specifiers when printing PSI flags K Prateek Nayak
2025-11-18  1:08   ` John Stultz
2025-12-02 14:33   ` Johannes Weiner
2025-11-17 18:55 ` [RFC PATCH 3/5] sched/core: Track blocked tasks retained on rq for proxy K Prateek Nayak
2025-11-17 20:44   ` kernel test robot
2025-11-18  2:03     ` K Prateek Nayak
2025-11-18  1:46   ` kernel test robot
2025-11-18  4:38   ` K Prateek Nayak
2025-11-17 18:55 ` [RFC PATCH 4/5] sched/core: Block proxy task on pick when blocked_on is cleared before wakeup K Prateek Nayak
2025-11-17 18:55 ` [RFC PATCH 5/5] sched/psi: Fix PSI signals of blocked tasks retained for proxy K Prateek Nayak
2025-11-18  0:45 ` [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy execution John Stultz
2025-11-18  1:38   ` K Prateek Nayak
2025-11-18  4:26     ` John Stultz
2025-11-18  5:08       ` K Prateek Nayak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.