All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
@ 2021-01-13 18:31 Valentin Schneider
  2021-01-13 22:02 ` Paul E. McKenney
  2021-01-22 17:41 ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider
  0 siblings, 2 replies; 10+ messages in thread
From: Valentin Schneider @ 2021-01-13 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: paulmck, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

Since commit

  1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")

tasks are expected to move themselves out of a out-going CPU. For most
tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads
will have to cooperate and move themselves away one way or another.

Currently, some percpu kthreads (workqueues being a notable exemple) do not
cooperate nicely and can end up on an out-going CPU at the time
sched_cpu_dying() is invoked.

Print the dying rq's tasks to shed some light on the stragglers.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
As Peter pointed out, this should really be caught much earlier than
sched_cpu_dying().

If we go down the route of preventing kthreads from being affined to
!active CPUs in __set_cpus_allowed_ptr() (genuine percpu kthreads sidestep
it via kthread_bind_mask()), then I *think* we could catch this in wakeups,
i.e. select_task_rq(). I've been playing around there, but it's not as
straightforward as I'd have hoped.
---
 kernel/sched/core.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9a08a39d7cdb..d784dd1ae436 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7564,6 +7564,25 @@ static void calc_load_migrate(struct rq *rq)
 		atomic_long_add(delta, &calc_load_tasks);
 }
 
+static void dump_rq_tasks(struct rq *rq, const char *loglvl)
+{
+	struct task_struct *g, *p;
+	int cpu = cpu_of(rq);
+
+	lockdep_assert_held(&rq->lock);
+
+	printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, rq->nr_running);
+	for_each_process_thread(g, p) {
+		if (task_cpu(p) != cpu)
+			continue;
+
+		if (!task_on_rq_queued(p))
+			continue;
+
+		printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm);
+	}
+}
+
 int sched_cpu_dying(unsigned int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7573,7 +7592,10 @@ int sched_cpu_dying(unsigned int cpu)
 	sched_tick_stop(cpu);
 
 	rq_lock_irqsave(rq, &rf);
-	BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq));
+	if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {
+		WARN(true, "Dying CPU not properly vacated!");
+		dump_rq_tasks(rq, KERN_WARNING);
+	}
 	rq_unlock_irqrestore(rq, &rf);
 
 	calc_load_migrate(rq);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-01-22 17:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-13 18:31 [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying() Valentin Schneider
2021-01-13 22:02 ` Paul E. McKenney
2021-01-14  0:15   ` Valentin Schneider
2021-01-14  0:36     ` Paul E. McKenney
2021-01-14 10:37       ` Valentin Schneider
2021-01-14 15:22         ` Paul E. McKenney
2021-01-14 15:52           ` Valentin Schneider
2021-01-14 17:13             ` Paul E. McKenney
2021-01-15 16:54               ` Paul E. McKenney
2021-01-22 17:41 ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.