* [PATCH] possible scheduler deadlock in 2.6.16
@ 2006-03-22 10:41 Anton Blanchard
2006-03-22 11:09 ` Nick Piggin
2006-03-22 13:16 ` Ingo Molnar
0 siblings, 2 replies; 5+ messages in thread
From: Anton Blanchard @ 2006-03-22 10:41 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm, linuxppc-dev, mingo
Hi,
We have noticed lockups during boot when stress testing kexec on ppc64.
Two cpus would deadlock in scheduler code trying to grab already taken
spinlocks.
The double_rq_lock code uses the address of the runqueue to order the
taking of multiple locks. This address is a per cpu variable:
if (rq1 < rq2) {
spin_lock(&rq1->lock);
spin_lock(&rq2->lock);
} else {
spin_lock(&rq2->lock);
spin_lock(&rq1->lock);
}
On the other hand, the code in wake_sleeping_dependent uses the cpu id
order to grab locks:
for_each_cpu_mask(i, sibling_map)
spin_lock(&cpu_rq(i)->lock);
This means we rely on the address of per cpu data increasing as cpu ids
increase. While this will be true for the generic percpu implementation
it may not be true for arch specific implementations.
One way to solve this is to always take runqueues in cpu id order. To do
this we add a cpu variable to the runqueue and check it in the
double runqueue locking functions.
Thoughts?
Anton
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: build/kernel/sched.c
===================================================================
--- build.orig/kernel/sched.c 2006-03-22 18:46:53.000000000 +1100
+++ build/kernel/sched.c 2006-03-22 20:44:20.000000000 +1100
@@ -237,6 +237,7 @@ struct runqueue {
task_t *migration_thread;
struct list_head migration_queue;
+ int cpu;
#endif
#ifdef CONFIG_SCHEDSTATS
@@ -1660,6 +1661,9 @@ unsigned long nr_iowait(void)
/*
* double_rq_lock - safely lock two runqueues
*
+ * We must take them in cpu order to match code in
+ * dependent_sleeper and wake_dependent_sleeper.
+ *
* Note this does not disable interrupts like task_rq_lock,
* you need to do so manually before calling.
*/
@@ -1671,7 +1675,7 @@ static void double_rq_lock(runqueue_t *r
spin_lock(&rq1->lock);
__acquire(rq2->lock); /* Fake it out ;) */
} else {
- if (rq1 < rq2) {
+ if (rq1->cpu < rq2->cpu) {
spin_lock(&rq1->lock);
spin_lock(&rq2->lock);
} else {
@@ -1707,7 +1711,7 @@ static void double_lock_balance(runqueue
__acquires(this_rq->lock)
{
if (unlikely(!spin_trylock(&busiest->lock))) {
- if (busiest < this_rq) {
+ if (busiest->cpu < this_rq->cpu) {
spin_unlock(&this_rq->lock);
spin_lock(&busiest->lock);
spin_lock(&this_rq->lock);
@@ -6035,6 +6039,7 @@ void __init sched_init(void)
rq->push_cpu = 0;
rq->migration_thread = NULL;
INIT_LIST_HEAD(&rq->migration_queue);
+ rq->cpu = i;
#endif
atomic_set(&rq->nr_iowait, 0);
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] possible scheduler deadlock in 2.6.16
2006-03-22 10:41 [PATCH] possible scheduler deadlock in 2.6.16 Anton Blanchard
@ 2006-03-22 11:09 ` Nick Piggin
2006-03-22 12:17 ` Anton Blanchard
2006-03-22 22:52 ` Peter Williams
2006-03-22 13:16 ` Ingo Molnar
1 sibling, 2 replies; 5+ messages in thread
From: Nick Piggin @ 2006-03-22 11:09 UTC (permalink / raw)
To: Anton Blanchard; +Cc: akpm, linuxppc-dev, mingo, linux-kernel
Anton Blanchard wrote:
> One way to solve this is to always take runqueues in cpu id order. To do
> this we add a cpu variable to the runqueue and check it in the
> double runqueue locking functions.
>
> Thoughts?
>
You're right. I can't think of a better fix, although we've been trying
to avoid adding cpu to the runqueue structure.
I was going to suggest moving more work into wake_sleeping_dependent
instead, but cores with 4 and more threads now make that less desirable
I suppose.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] possible scheduler deadlock in 2.6.16
2006-03-22 11:09 ` Nick Piggin
@ 2006-03-22 12:17 ` Anton Blanchard
2006-03-22 22:52 ` Peter Williams
1 sibling, 0 replies; 5+ messages in thread
From: Anton Blanchard @ 2006-03-22 12:17 UTC (permalink / raw)
To: Nick Piggin; +Cc: akpm, linuxppc-dev, mingo, linux-kernel
Hi Nick,
> You're right. I can't think of a better fix, although we've been trying
> to avoid adding cpu to the runqueue structure.
>
> I was going to suggest moving more work into wake_sleeping_dependent
> instead, but cores with 4 and more threads now make that less desirable
> I suppose.
My thoughts too. I wasnt sure if davem is planning to use the sibling
code for his niagara work, but locking us down to 2 siblings sounds like
a bad idea.
Anton
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] possible scheduler deadlock in 2.6.16
2006-03-22 10:41 [PATCH] possible scheduler deadlock in 2.6.16 Anton Blanchard
2006-03-22 11:09 ` Nick Piggin
@ 2006-03-22 13:16 ` Ingo Molnar
1 sibling, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2006-03-22 13:16 UTC (permalink / raw)
To: Anton Blanchard; +Cc: akpm, linuxppc-dev, linux-kernel
* Anton Blanchard <anton@samba.org> wrote:
> One way to solve this is to always take runqueues in cpu id order. To
> do this we add a cpu variable to the runqueue and check it in the
> double runqueue locking functions.
>
> Thoughts?
it's fine with me - the overhead to double_rq_lock() is minimal, and
it's not critical code either.
Acked-by: Ingo Molnar <mingo@elte.hu>
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] possible scheduler deadlock in 2.6.16
2006-03-22 11:09 ` Nick Piggin
2006-03-22 12:17 ` Anton Blanchard
@ 2006-03-22 22:52 ` Peter Williams
1 sibling, 0 replies; 5+ messages in thread
From: Peter Williams @ 2006-03-22 22:52 UTC (permalink / raw)
To: Nick Piggin; +Cc: akpm, linuxppc-dev, mingo, linux-kernel
Nick Piggin wrote:
> Anton Blanchard wrote:
>
>> One way to solve this is to always take runqueues in cpu id order. To do
>> this we add a cpu variable to the runqueue and check it in the
>> double runqueue locking functions.
>>
>> Thoughts?
>>
>
> You're right. I can't think of a better fix, although we've been trying
> to avoid adding cpu to the runqueue structure.
But now that it's there it will enable further optimizations in parts of
sched.c, wouldn't it? E.g. there's a number of functions that get
passed both the run queue and the CPI id as arguments and these could be
simplified.
Peter
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-03-22 22:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-22 10:41 [PATCH] possible scheduler deadlock in 2.6.16 Anton Blanchard
2006-03-22 11:09 ` Nick Piggin
2006-03-22 12:17 ` Anton Blanchard
2006-03-22 22:52 ` Peter Williams
2006-03-22 13:16 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).