linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Haskins <ghaskins@novell.com>
To: Chirag Jog <chirag@linux.vnet.ibm.com>
Cc: linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	rostedt@goodmis.org, dvhltc@us.ibm.com, dino@in.ibm.com,
	Gilles.Carry@bull.net
Subject: [RT PATCH v3 2/2] RT: fix push_rt_task() to handle dequeue_pushable properly
Date: Mon, 06 Oct 2008 11:14:43 -0400	[thread overview]
Message-ID: <20081006151442.8452.7838.stgit@dev.haskins.net> (raw)
In-Reply-To: <20081006151228.8452.69938.stgit@dev.haskins.net>

A panic was discovered by Chirag Jog where a BUG_ON sanity check
in the new "pushable_task" logic would trigger a panic under
certain circumstances:

http://lkml.org/lkml/2008/9/25/189

Gilles Carry discovered that the root cause was attributed to the
pushable_tasks list getting corrupted in the push_rt_task logic.
This was the result of a dropped rq lock in double_lock_balance
allowing a task in the process of being pushed to potentially migrate
away, and thus corrupt the pushable_tasks() list.

I traced back the problem as introduced by the pushable_tasks patch
that went in recently.   There is a "retry" path in push_rt_task()
that actually had a compound conditional to decide whether to
retry or exit.  I missed the meaning behind the rationale for the
virtual "if(!task) goto out;" portion of the compound statement and
thus did not handle it properly.  The new pushable_tasks logic
actually creates three distinct conditions:

1) an untouched and unpushable task should be dequeued
2) a migrated task where more pushable tasks remain should be retried
3) a migrated task where no more pushable tasks exist should exit

The original logic mushed (1) and (3) together, resulting in the
system dequeuing a migrated task (against an unlocked foreign run-queue
nonetheless).

To fix this, we get rid of the notion of "paranoid" and we support the
three unique conditions properly.  The paranoid feature is no longer
relevant with the new pushable logic (since pushable naturally limits
the loop) anyway, so lets just remove it.

Reported-By: Chirag Jog <chirag@linux.vnet.ibm.com>
Found-by: Gilles Carry <gilles.carry@bull.net>
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 kernel/sched_rt.c |   34 ++++++++++++++++++++++------------
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 59ead84..05a1d4a 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1056,7 +1056,6 @@ static int push_rt_task(struct rq *rq)
 {
 	struct task_struct *next_task;
 	struct rq *lowest_rq;
-	int paranoid = RT_MAX_TRIES;
 
 	if (!rq->rt.overloaded)
 		return 0;
@@ -1090,23 +1089,34 @@ static int push_rt_task(struct rq *rq)
 		struct task_struct *task;
 		/*
 		 * find lock_lowest_rq releases rq->lock
-		 * so it is possible that next_task has changed.
-		 * If it has, then try again.
+		 * so it is possible that next_task has migrated.
+		 *
+		 * We need to make sure that the task is still on the same
+		 * run-queue and is also still the next task eligible for
+		 * pushing.
 		 */
 		task = pick_next_pushable_task(rq);
-		if (unlikely(task != next_task) && task && paranoid--) {
-			put_task_struct(next_task);
-			next_task = task;
-			goto retry;
+		if (task_cpu(next_task) == rq->cpu && task == next_task) {
+			/*
+			 * If we get here, the task hasnt moved at all, but
+			 * it has failed to push.  We will not try again,
+			 * since the other cpus will pull from us when they
+			 * are ready.
+			 */
+			dequeue_pushable_task(rq, next_task);
+			goto out;
 		}
+		
+		if (!task)
+			/* No more tasks, just exit */
+			goto out;
 
 		/*
-		 * Once we have failed to push this task, we will not
-		 * try again, since the other cpus will pull from us
-		 * when they are ready
+		 * Something has shifted, try again.
 		 */
-		dequeue_pushable_task(rq, next_task);
-		goto out;
+		put_task_struct(next_task);
+		next_task = task;
+		goto retry;
 	}
 
 	deactivate_task(rq, next_task, 0);


  parent reply	other threads:[~2008-10-06 15:10 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-25 12:32 [BUG][PPC64] BUG in 2.6.26.5-rt9 causing Hang Chirag Jog
2008-09-29 18:13 ` Gregory Haskins
2008-09-29 21:18 ` Gregory Haskins
2008-09-29 21:34   ` Gregory Haskins
2008-09-29 22:00     ` Gregory Haskins
2008-09-30  4:43       ` Chirag Jog
2008-09-30  6:47         ` Gilles Carry
2008-10-01 14:22         ` [PATCH] sched: add a stacktrace on enqueue_pushable error Gregory Haskins
2008-10-02  9:42           ` Gilles Carry
2008-10-02 11:18   ` [BUG][PPC64] BUG in 2.6.26.5-rt9 causing Hang Gilles Carry
2008-10-03 12:42 ` [RT PATCH 0/2] fix for BUG_ON crash in 26.5-rt9 Gregory Haskins
2008-10-03 12:43   ` [PATCH 1/2] RT: Remove comment that is no longer true Gregory Haskins
2008-10-03 12:43   ` [PATCH 2/2] RT: remove "paranoid" limit in push_rt_task Gregory Haskins
2008-10-03 13:46     ` Gilles Carry
2008-10-03 15:45       ` Chirag Jog
2008-10-03 17:27         ` Gregory Haskins
2008-10-03 17:26       ` [RT PATCH v2 0/2] Series short description Gregory Haskins
2008-10-03 17:26         ` [RT PATCH v2 1/2] RT: Remove comment that is no longer true Gregory Haskins
2008-10-03 17:26         ` [RT PATCH v2 2/2] RT: remove "paranoid" limit in push_rt_task Gregory Haskins
2008-10-03 12:54   ` [RT PATCH 0/2] fix for BUG_ON crash in 26.5-rt9 Gregory Haskins
2008-10-06 15:14 ` [RT PATCH v3 0/2] Fix for "[BUG][PPC64] BUG in 2.6.26.5-rt9 causing Hang" Gregory Haskins
2008-10-06 15:14   ` [RT PATCH v3 1/2] RT: Remove comment that is no longer true Gregory Haskins
2008-10-06 15:14   ` Gregory Haskins [this message]
2008-10-07  6:04   ` [RT PATCH v3 0/2] Fix for "[BUG][PPC64] BUG in 2.6.26.5-rt9 causing Hang" Gilles Carry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081006151442.8452.7838.stgit@dev.haskins.net \
    --to=ghaskins@novell.com \
    --cc=Gilles.Carry@bull.net \
    --cc=chirag@linux.vnet.ibm.com \
    --cc=dino@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).