* [PATCH replace] Fix pushable_tasks list corruption
@ 2008-10-03 8:24 Gilles Carry
2008-10-03 12:50 ` Gregory Haskins
0 siblings, 1 reply; 2+ messages in thread
From: Gilles Carry @ 2008-10-03 8:24 UTC (permalink / raw)
To: linux-rt-users
Cc: rostedt, tinytim, jean-pierre.dion, sebastien.dugue, gilles.carry,
ghaskins
From: gilles.carry <gilles.carry@bull.net>
Symptoms:
System hang (endless loop in plist_check_list)
or BUG because of faulty prev/next pointers in
pushable_task node.
When push_rt_task successes finding a task to push away, it
performs a double lock on the runqueues (local and target) but
before getting both locks, it releases the local rq lock allowing
other cpus grab the task in between. (eg. pull_rt_task, timers...)
In some situations, when push_rt_task calls dequeue_pushable_task
the task may have already been removed from the pushable_tasks
list by another cpu. Removing the node again corrupts the list.
This patch adds a sanity check to dequeue_pushable_task which only
removes the node if it's still on the original list.
Signed-off-by: Gilles Carry <gilles.carry@bull.net>
Cc: ghaskins@novell.com
---
kernel/sched_rt.c | 18 +++++++++++++++++-
1 files changed, 17 insertions(+), 1 deletions(-)
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 57a0c0d..7aa4450 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -62,7 +62,17 @@ static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
{
- plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
+ struct plist_node *next;
+
+ /* Sanity check: delete only if node is still on this list */
+ plist_for_each(next, &rq->rt.pushable_tasks) {
+ if (&p->pushable_tasks == next) {
+ plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
+ return;
+ }
+ }
+
+
}
#else
@@ -1105,6 +1115,12 @@ static int push_rt_task(struct rq *rq)
* try again, since the other cpus will pull from us
* when they are ready
*/
+
+ /*
+ * If we reach here and the task has migrated to another cpu
+ * (paranoid == 0?) calling dequeue_pushable_task may cause
+ * pushable_tasks list corruption.
+ */
dequeue_pushable_task(rq, next_task);
goto out;
}
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH replace] Fix pushable_tasks list corruption
2008-10-03 8:24 [PATCH replace] Fix pushable_tasks list corruption Gilles Carry
@ 2008-10-03 12:50 ` Gregory Haskins
0 siblings, 0 replies; 2+ messages in thread
From: Gregory Haskins @ 2008-10-03 12:50 UTC (permalink / raw)
To: Gilles Carry
Cc: linux-rt-users, rostedt, tinytim, jean-pierre.dion,
sebastien.dugue
[-- Attachment #1: Type: text/plain, Size: 2800 bytes --]
Hi Gilles,
Gilles Carry wrote:
> From: gilles.carry <gilles.carry@bull.net>
>
> Symptoms:
> System hang (endless loop in plist_check_list)
> or BUG because of faulty prev/next pointers in
> pushable_task node.
>
> When push_rt_task successes finding a task to push away, it
> performs a double lock on the runqueues (local and target) but
> before getting both locks, it releases the local rq lock allowing
> other cpus grab the task in between. (eg. pull_rt_task, timers...)
> In some situations, when push_rt_task calls dequeue_pushable_task
> the task may have already been removed from the pushable_tasks
> list by another cpu. Removing the node again corrupts the list.
>
> This patch adds a sanity check to dequeue_pushable_task which only
> removes the node if it's still on the original list.
>
I think your analysis is spot on, though I am still concerned about your
implementation. It still "papers over" the issue, so to speak, which is
that the item moved to a new RQ. In addition, it adds a relatively
expensive check to the fast path of the list dequeue.
Based on this, I have submitted patches that address your finding but in
a more targeted and efficient approach (crediting you and Chirag) here:
http://lkml.org/lkml/2008/10/3/130
So I have to respectfully NACK this patch, but I do thank you for
investigating and pointing me at the actual root problem. It is
sincerely appreciated.
Regards,
-Greg
> Signed-off-by: Gilles Carry <gilles.carry@bull.net>
> Cc: ghaskins@novell.com
> ---
> kernel/sched_rt.c | 18 +++++++++++++++++-
> 1 files changed, 17 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 57a0c0d..7aa4450 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -62,7 +62,17 @@ static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
>
> static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
> {
> - plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
> + struct plist_node *next;
> +
> + /* Sanity check: delete only if node is still on this list */
> + plist_for_each(next, &rq->rt.pushable_tasks) {
> + if (&p->pushable_tasks == next) {
> + plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
> + return;
> + }
> + }
> +
> +
> }
>
> #else
> @@ -1105,6 +1115,12 @@ static int push_rt_task(struct rq *rq)
> * try again, since the other cpus will pull from us
> * when they are ready
> */
> +
> + /*
> + * If we reach here and the task has migrated to another cpu
> + * (paranoid == 0?) calling dequeue_pushable_task may cause
> + * pushable_tasks list corruption.
> + */
> dequeue_pushable_task(rq, next_task);
> goto out;
> }
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2008-10-03 12:46 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-03 8:24 [PATCH replace] Fix pushable_tasks list corruption Gilles Carry
2008-10-03 12:50 ` Gregory Haskins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).