linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Amit K. Arora" <aarora@linux.vnet.ibm.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>, Ingo Molnar <mingo@elte.hu>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Gautham R Shenoy <ego@in.ibm.com>,
	Darren Hart <dvhltc@us.ibm.com>,
	Brian King <brking@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH] Make sure timers have migrated before killing migration_thread
Date: Wed, 19 May 2010 14:35:57 +0530	[thread overview]
Message-ID: <20100519090557.GA15237@amitarora.in.ibm.com> (raw)

Problem : In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems to
be hung at a point where migration_call() tries to kill the migration_thread
of the dying CPU, which just got moved to the current CPU. This migration
thread does not get a chance to run (and die) since rt_throttled is set to 1
on current, and it doesn't get cleared as the hrtimer which is supposed to
reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU being
offlined.

Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
event. By then all the timers (including sched_rt_period_timer) should have got
migrated (along with other callbacks).

Alternate Solution considered : Another option considered was to
increase the priority of the hrtimer cpu offline notifier, such that it
gets to run before scheduler's migration cpu offline notifier. In this
way we are sure that the timers will get migrated before migration_call
tries to kill migration_thread. But, this can have some non-obvious
implications, suggested Srivatsa.

Testing : Without the patch the stress tests didn't last for even 12
hours. And yes, the problem was reproducible. With the patch applied the
tests ran successfully for more than 48 hours.

Thanks!
--
Regards,
Amit Arora

 Signed-off-by: Amit Arora <aarora@in.ibm.com>
 Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
--
diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
--- linux-2.6.34.org/kernel/sched.c	2010-05-18 22:56:21.000000000 -0700
+++ linux-2.6.34/kernel/sched.c	2010-05-18 22:58:31.000000000 -0700
@@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
 		cpu_rq(cpu)->migration_thread = NULL;
 		break;
 
+	case CPU_POST_DEAD:
+		/*
+		  Bring the migration thread down in CPU_POST_DEAD event,
+		  since the timers should have got migrated by now and thus
+		  we should not see a deadlock between trying to kill the
+		  migration thread and the sched_rt_period_timer.
+		*/
+		cpuset_lock();
+		rq = cpu_rq(cpu);
+		kthread_stop(rq->migration_thread);
+		put_task_struct(rq->migration_thread);
+		rq->migration_thread = NULL;
+		cpuset_unlock();
+		break;
+
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
-		kthread_stop(rq->migration_thread);
-		put_task_struct(rq->migration_thread);
-		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
 		raw_spin_lock_irq(&rq->lock);
 		update_rq_clock(rq);

             reply	other threads:[~2010-05-19  9:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-19  9:05 Amit K. Arora [this message]
2010-05-19  9:31 ` [PATCH] Make sure timers have migrated before killing migration_thread Peter Zijlstra
2010-05-19 12:13   ` [PATCH v2] " Amit K. Arora
2010-05-20  7:28     ` Peter Zijlstra
2010-05-23  9:07       ` Mike Galbraith
2010-05-23  9:13         ` Peter Zijlstra
2010-05-24  6:43       ` Amit K. Arora
2010-05-25 20:19       ` Thomas Gleixner
2010-05-26  6:43         ` Peter Zijlstra
2010-05-24  9:59   ` [PATCH] " Amit K. Arora
2010-05-24 13:28     ` Peter Zijlstra
2010-05-24 15:16       ` Srivatsa Vaddagiri
2010-05-24 15:55         ` Peter Zijlstra
2010-05-25 11:31     ` Peter Zijlstra
2010-05-25 12:10       ` Amit K. Arora
2010-05-25 13:23       ` [PATCH v3] " Amit K. Arora
2010-05-25 14:22         ` Peter Zijlstra
2010-05-25 16:27         ` Tejun Heo
2010-05-31  7:18         ` [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread tip-bot for Amit K. Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100519090557.GA15237@amitarora.in.ibm.com \
    --to=aarora@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=brking@linux.vnet.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).