Linux block layer
 help / color / mirror / Atom feed
* [PATCH] sched: flush plug in schedule_preempt_disabled() to prevent deadlock
@ 2026-05-12  8:59 Ming Lei
  2026-05-12 12:04 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2026-05-12  8:59 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Ming Lei, Michael Wu, Xiaosen He

On preemptible kernels, a deadlock can occur when a task with plugged IO
calls schedule_preempt_disabled():

  schedule_preempt_disabled()
    sched_preempt_enable_no_resched()  // preemption now enabled
    schedule()                         // <-- preemption can happen here
      sched_submit_work()
        blk_flush_plug()

After sched_preempt_enable_no_resched() re-enables preemption, the task
can be preempted (e.g., by a higher-priority RT task) before reaching
blk_flush_plug() in sched_submit_work(). Since the task's state is
already TASK_UNINTERRUPTIBLE (set by the mutex/rwsem slowpath caller),
requests in current->plug remain unflushed for an unbounded time.

If another task depends on those plugged requests to make progress (e.g.,
to release a lock the sleeping task needs), a deadlock results:

  - Task A (writeback worker): holds plugged IO, preempted before
    flushing, stuck on run queue behind higher-priority work
  - Task B: waiting for IO completion from Task A's plug, holds a lock
    that Task A needs to be woken up

Both reported deadlocks involve mutex/rwsem slowpaths, which are the
primary callers of schedule_preempt_disabled() with non-running task
state.

Fix by flushing the plug in schedule_preempt_disabled() while
preemption is still disabled. This ensures the plug is empty before the
preemption window opens.

Fixes: 73c101011926 ("block: initial patch for on-stack per-task plugging")
Reported-by: Michael Wu <michael@allwinnertech.com>
Tested-by: Michael Wu <michael@allwinnertech.com>
Reported-by: Xiaosen He <xiaosen.he@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-block/20260417082744.30124-1-michael@allwinnertech.com/
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c6..c1efe110c54d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7336,6 +7336,8 @@ asmlinkage __visible void __sched schedule_user(void)
  */
 void __sched schedule_preempt_disabled(void)
 {
+	if (!task_is_running(current))
+		blk_flush_plug(current->plug, true);
 	sched_preempt_enable_no_resched();
 	schedule();
 	preempt_disable();
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-12 17:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12  8:59 [PATCH] sched: flush plug in schedule_preempt_disabled() to prevent deadlock Ming Lei
2026-05-12 12:04 ` Peter Zijlstra
2026-05-12 12:40   ` Peter Zijlstra
2026-05-12 15:45     ` Ming Lei
2026-05-12 16:49       ` Peter Zijlstra
2026-05-12 16:53       ` Peter Zijlstra
2026-05-12 17:16       ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox