* [BUG] CFS vs cpu hotplug
@ 2008-06-19 16:19 Heiko Carstens
2008-06-19 18:05 ` Peter Zijlstra
2008-06-25 22:12 ` Dmitry Adamushko
0 siblings, 2 replies; 28+ messages in thread
From: Heiko Carstens @ 2008-06-19 16:19 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Avi Kivity; +Cc: linux-kernel
Hi Ingo, Peter,
I'm still seeing kernel crashes on cpu hotplug with Linus' current git tree.
All I have to do is to make all cpus busy (make -j4 of the kernel source is
sufficient) and then start cpu hotplug stress.
It usually takes below a minute to crash the system like this:
Unable to handle kernel pointer dereference at virtual kernel address 005a800000031000
Oops: 0038 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 Not tainted 2.6.26-rc6-00232-g9bedbcb #356
Process swapper (pid: 0, task: 000000002fe7ccf8, ksp: 000000002fe93d78)
Krnl PSW : 0400e00180000000 0000000000032c6c (pick_next_task_fair+0x34/0xb0)
R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:2 PM:0 EA:3
Krnl GPRS: 00000000001ff000 0000000000030bd8 000000000075a380 000000002fe7ccf8
0000000000386690 0000000000000008 0000000000000000 000000002fe7cf58
0000000000000001 000000000075a300 0000000000000000 000000002fe93d40
005a800000031201 0000000000386010 000000002fe93d78 000000002fe93d40
Krnl Code: 0000000000032c5c: e3e0f0980024 stg %r14,152(%r15)
0000000000032c62: d507d000c010 clc 0(8,%r13),16(%r12)
0000000000032c68: a784003c brc 8,32ce0
>0000000000032c6c: d507d000c030 clc 0(8,%r13),48(%r12)
0000000000032c72: b904002c lgr %r2,%r12
0000000000032c76: a7a90000 lghi %r10,0
0000000000032c7a: a7840021 brc 8,32cbc
0000000000032c7e: c0e5ffffefe3 brasl %r14,30c44
Call Trace:
([<000000000075a300>] 0x75a300)
[<000000000037195a>] schedule+0x162/0x7f4
[<000000000001a2be>] cpu_idle+0x1ca/0x25c
[<000000000036f368>] start_secondary+0xac/0xb8
[<0000000000000000>] 0x0
[<0000000000000000>] 0x0
Last Breaking-Event-Address:
[<0000000000032cc6>] pick_next_task_fair+0x8e/0xb0
<4>---[ end trace 9bb55df196feedcc ]---
Kernel panic - not syncing: Attempted to kill the idle task!
Please note that the above call trace is from s390, however Avi reported the
same bug on x86_64.
I tried to bisect this and ended up somewhere at the beginning of 2.6.23 when
the CFS patches got merged. Unfortunately it got harder and harder to reproduce
so that I couldn't bisect this down to a single patch.
One observation however is that this always happens after cpu_up(), not
cpu_down().
I modified the kernel sources a bit (actually only added a single "noinline")
to get some sensible debug data and dumped a crashed system. These are the
contents of the scheduler data structures which cause the crash:
>> px *(cfs_rq *) 0x75a380
struct cfs_rq {
load = struct load_weight {
weight = 0x800
inv_weight = 0x0
}
nr_running = 0x1
exec_clock = 0x0
min_vruntime = 0xbf7e9776
tasks_timeline = struct rb_root {
rb_node = (nil)
}
rb_leftmost = (nil) <<<<<<<<<<<< shouldn't be NULL
tasks = struct list_head {
next = 0x759328
prev = 0x759328
}
balance_iterator = (nil)
curr = 0x759300
next = (nil)
nr_spread_over = 0x0
rq = 0x75a300
leaf_cfs_rq_list = struct list_head {
next = (nil)
prev = (nil)
}
tg = 0x564970
}
The sched_entity that belongs to the cfs_rq:
>> px *(sched_entity *) 0x759300
struct sched_entity {
load = struct load_weight {
weight = 0x800
inv_weight = 0x1ffc01
}
run_node = struct rb_node {
rb_parent_color = 0x1
rb_right = (nil)
rb_left = (nil)
}
group_node = struct list_head {
next = 0x75a3b8
prev = 0x75a3b8
}
on_rq = 0x1
exec_start = 0x189685acb4aa46
sum_exec_runtime = 0x188a2b84c
vruntime = 0xd036bd29
prev_sum_exec_runtime = 0x1672e3f62
last_wakeup = 0x0
avg_overlap = 0x0
parent = (nil)
cfs_rq = 0x75a380
my_q = 0x759400
}
And the rq:
>> px *(rq *) 0x75a300
struct rq {
lock = spinlock_t {
raw_lock = raw_spinlock_t {
owner_cpu = 0xfffffffe
}
break_lock = 0x1
magic = 0xdead4ead
owner_cpu = 0x1
owner = 0x2ef95350
}
nr_running = 0x1
cpu_load = {
[0] 0x3062
[1] 0x2bdf
[2] 0x20db
[3] 0x171e
[4] 0x1010
}
idle_at_tick = 0x0
last_tick_seen = 0x0
in_nohz_recently = 0x0
load = struct load_weight {
weight = 0xc31
inv_weight = 0x0
}
nr_load_updates = 0x95f
nr_switches = 0x3f68
cfs = struct cfs_rq {
load = struct load_weight {
weight = 0x800
inv_weight = 0x0
}
nr_running = 0x1
exec_clock = 0x0
min_vruntime = 0xbf7e9776
tasks_timeline = struct rb_root {
rb_node = (nil)
}
rb_leftmost = (nil)
tasks = struct list_head {
next = 0x759328
prev = 0x759328
}
balance_iterator = (nil)
curr = 0x759300
next = (nil)
nr_spread_over = 0x0
rq = 0x75a300
leaf_cfs_rq_list = struct list_head {
next = (nil)
prev = (nil)
}
tg = 0x564970
}
rt = struct rt_rq {
active = struct rt_prio_array {
bitmap = {
[0] 0x0
[1] 0x1000000000
}
queue = {
[0] struct list_head {
next = 0x75a418
prev = 0x75a418
}
[1] struct list_head {
next = 0x75a428
prev = 0x75a428
}
[2] struct list_head {
next = 0x75a438
prev = 0x75a438
}
[3] struct list_head {
next = 0x75a448
prev = 0x75a448
}
[4] struct list_head {
next = 0x75a458
prev = 0x75a458
}
[5] struct list_head {
next = 0x75a468
prev = 0x75a468
}
[6] struct list_head {
next = 0x75a478
prev = 0x75a478
}
[7] struct list_head {
next = 0x75a488
prev = 0x75a488
}
[8] struct list_head {
next = 0x75a498
prev = 0x75a498
}
[9] struct list_head {
next = 0x75a4a8
prev = 0x75a4a8
}
[10] struct list_head {
next = 0x75a4b8
prev = 0x75a4b8
}
[11] struct list_head {
next = 0x75a4c8
prev = 0x75a4c8
}
[12] struct list_head {
next = 0x75a4d8
prev = 0x75a4d8
}
[13] struct list_head {
next = 0x75a4e8
prev = 0x75a4e8
}
[14] struct list_head {
next = 0x75a4f8
prev = 0x75a4f8
}
[15] struct list_head {
next = 0x75a508
prev = 0x75a508
}
[16] struct list_head {
next = 0x75a518
prev = 0x75a518
}
[17] struct list_head {
next = 0x75a528
prev = 0x75a528
}
[18] struct list_head {
next = 0x75a538
prev = 0x75a538
}
[19] struct list_head {
next = 0x75a548
prev = 0x75a548
}
[20] struct list_head {
next = 0x75a558
prev = 0x75a558
}
[21] struct list_head {
next = 0x75a568
prev = 0x75a568
}
[22] struct list_head {
next = 0x75a578
prev = 0x75a578
}
[23] struct list_head {
next = 0x75a588
prev = 0x75a588
}
[24] struct list_head {
next = 0x75a598
prev = 0x75a598
}
[25] struct list_head {
next = 0x75a5a8
prev = 0x75a5a8
}
[26] struct list_head {
next = 0x75a5b8
prev = 0x75a5b8
}
[27] struct list_head {
next = 0x75a5c8
prev = 0x75a5c8
}
[28] struct list_head {
next = 0x75a5d8
prev = 0x75a5d8
}
[29] struct list_head {
next = 0x75a5e8
prev = 0x75a5e8
}
[30] struct list_head {
next = 0x75a5f8
prev = 0x75a5f8
}
[31] struct list_head {
next = 0x75a608
prev = 0x75a608
}
[32] struct list_head {
next = 0x75a618
prev = 0x75a618
}
[33] struct list_head {
next = 0x75a628
prev = 0x75a628
}
[34] struct list_head {
next = 0x75a638
prev = 0x75a638
}
[35] struct list_head {
next = 0x75a648
prev = 0x75a648
}
[36] struct list_head {
next = 0x75a658
prev = 0x75a658
}
[37] struct list_head {
next = 0x75a668
prev = 0x75a668
}
[38] struct list_head {
next = 0x75a678
prev = 0x75a678
}
[39] struct list_head {
next = 0x75a688
prev = 0x75a688
}
[40] struct list_head {
next = 0x75a698
prev = 0x75a698
}
[41] struct list_head {
next = 0x75a6a8
prev = 0x75a6a8
}
[42] struct list_head {
next = 0x75a6b8
prev = 0x75a6b8
}
[43] struct list_head {
next = 0x75a6c8
prev = 0x75a6c8
}
[44] struct list_head {
next = 0x75a6d8
prev = 0x75a6d8
}
[45] struct list_head {
next = 0x75a6e8
prev = 0x75a6e8
}
[46] struct list_head {
next = 0x75a6f8
prev = 0x75a6f8
}
[47] struct list_head {
next = 0x75a708
prev = 0x75a708
}
[48] struct list_head {
next = 0x75a718
prev = 0x75a718
}
[49] struct list_head {
next = 0x75a728
prev = 0x75a728
}
[50] struct list_head {
next = 0x75a738
prev = 0x75a738
}
[51] struct list_head {
next = 0x75a748
prev = 0x75a748
}
[52] struct list_head {
next = 0x75a758
prev = 0x75a758
}
[53] struct list_head {
next = 0x75a768
prev = 0x75a768
}
[54] struct list_head {
next = 0x75a778
prev = 0x75a778
}
[55] struct list_head {
next = 0x75a788
prev = 0x75a788
}
[56] struct list_head {
next = 0x75a798
prev = 0x75a798
}
[57] struct list_head {
next = 0x75a7a8
prev = 0x75a7a8
}
[58] struct list_head {
next = 0x75a7b8
prev = 0x75a7b8
}
[59] struct list_head {
next = 0x75a7c8
prev = 0x75a7c8
}
[60] struct list_head {
next = 0x75a7d8
prev = 0x75a7d8
}
[61] struct list_head {
next = 0x75a7e8
prev = 0x75a7e8
}
[62] struct list_head {
next = 0x75a7f8
prev = 0x75a7f8
}
[63] struct list_head {
next = 0x75a808
prev = 0x75a808
}
[64] struct list_head {
next = 0x75a818
prev = 0x75a818
}
[65] struct list_head {
next = 0x75a828
prev = 0x75a828
}
[66] struct list_head {
next = 0x75a838
prev = 0x75a838
}
[67] struct list_head {
next = 0x75a848
prev = 0x75a848
}
[68] struct list_head {
next = 0x75a858
prev = 0x75a858
}
[69] struct list_head {
next = 0x75a868
prev = 0x75a868
}
[70] struct list_head {
next = 0x75a878
prev = 0x75a878
}
[71] struct list_head {
next = 0x75a888
prev = 0x75a888
}
[72] struct list_head {
next = 0x75a898
prev = 0x75a898
}
[73] struct list_head {
next = 0x75a8a8
prev = 0x75a8a8
}
[74] struct list_head {
next = 0x75a8b8
prev = 0x75a8b8
}
[75] struct list_head {
next = 0x75a8c8
prev = 0x75a8c8
}
[76] struct list_head {
next = 0x75a8d8
prev = 0x75a8d8
}
[77] struct list_head {
next = 0x75a8e8
prev = 0x75a8e8
}
[78] struct list_head {
next = 0x75a8f8
prev = 0x75a8f8
}
[79] struct list_head {
next = 0x75a908
prev = 0x75a908
}
[80] struct list_head {
next = 0x75a918
prev = 0x75a918
}
[81] struct list_head {
next = 0x75a928
prev = 0x75a928
}
[82] struct list_head {
next = 0x75a938
prev = 0x75a938
}
[83] struct list_head {
next = 0x75a948
prev = 0x75a948
}
[84] struct list_head {
next = 0x75a958
prev = 0x75a958
}
[85] struct list_head {
next = 0x75a968
prev = 0x75a968
}
[86] struct list_head {
next = 0x75a978
prev = 0x75a978
}
[87] struct list_head {
next = 0x75a988
prev = 0x75a988
}
[88] struct list_head {
next = 0x75a998
prev = 0x75a998
}
[89] struct list_head {
next = 0x75a9a8
prev = 0x75a9a8
}
[90] struct list_head {
next = 0x75a9b8
prev = 0x75a9b8
}
[91] struct list_head {
next = 0x75a9c8
prev = 0x75a9c8
}
[92] struct list_head {
next = 0x75a9d8
prev = 0x75a9d8
}
[93] struct list_head {
next = 0x75a9e8
prev = 0x75a9e8
}
[94] struct list_head {
next = 0x75a9f8
prev = 0x75a9f8
}
[95] struct list_head {
next = 0x75aa08
prev = 0x75aa08
}
[96] struct list_head {
next = 0x75aa18
prev = 0x75aa18
}
[97] struct list_head {
next = 0x75aa28
prev = 0x75aa28
}
[98] struct list_head {
next = 0x75aa38
prev = 0x75aa38
}
[99] struct list_head {
next = 0x75aa48
prev = 0x75aa48
}
}
}
rt_nr_running = 0x0
highest_prio = 0x64
rt_nr_migratory = 0x0
overloaded = 0x0
rt_throttled = 0x0
rt_time = 0x123a999
rt_runtime = 0x389fd980
rt_runtime_lock = spinlock_t {
raw_lock = raw_spinlock_t {
owner_cpu = 0x0
}
break_lock = 0x0
magic = 0xdead4ead
owner_cpu = 0xffffffff
owner = 0xffffffffffffffff
}
}
leaf_cfs_rq_list = struct list_head {
next = 0x2f5a8970
prev = 0x759470
}
nr_uninterruptible = 0xfffffffffffffffe
curr = 0x2ef95350
idle = 0x2fe7ccf8
next_balance = 0x10000093b
prev_mm = (nil)
clock = 0x189685acb4d536
nr_iowait = atomic_t {
counter = 0x0
}
rd = 0x564a58
sd = (nil)
active_balance = 0x0
push_cpu = 0x0
cpu = 0x1
migration_thread = 0x2ef95350
migration_queue = struct list_head {
next = 0x75ab10
prev = 0x75ab10
}
rq_lock_key = struct lock_class_key {
}
}
Hopefully all of this debug data is of any use. If you need more, just let me
know.
Thanks!
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [BUG] CFS vs cpu hotplug 2008-06-19 16:19 [BUG] CFS vs cpu hotplug Heiko Carstens @ 2008-06-19 18:05 ` Peter Zijlstra 2008-06-19 18:14 ` Peter Zijlstra ` (3 more replies) 2008-06-25 22:12 ` Dmitry Adamushko 1 sibling, 4 replies; 28+ messages in thread From: Peter Zijlstra @ 2008-06-19 18:05 UTC (permalink / raw) To: Heiko Carstens; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > Hi Ingo, Peter, > > I'm still seeing kernel crashes on cpu hotplug with Linus' current git tree. > All I have to do is to make all cpus busy (make -j4 of the kernel source is > sufficient) and then start cpu hotplug stress. > It usually takes below a minute to crash the system like this: > > Unable to handle kernel pointer dereference at virtual kernel address 005a800000031000 > Oops: 0038 [#1] PREEMPT SMP > Modules linked in: > CPU: 1 Not tainted 2.6.26-rc6-00232-g9bedbcb #356 > Process swapper (pid: 0, task: 000000002fe7ccf8, ksp: 000000002fe93d78) > Krnl PSW : 0400e00180000000 0000000000032c6c (pick_next_task_fair+0x34/0xb0) I presume this is: se = pick_next_entity(cfs_rq); > R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:2 PM:0 EA:3 > Krnl GPRS: 00000000001ff000 0000000000030bd8 000000000075a380 000000002fe7ccf8 > 0000000000386690 0000000000000008 0000000000000000 000000002fe7cf58 > 0000000000000001 000000000075a300 0000000000000000 000000002fe93d40 > 005a800000031201 0000000000386010 000000002fe93d78 000000002fe93d40 > Krnl Code: 0000000000032c5c: e3e0f0980024 stg %r14,152(%r15) > 0000000000032c62: d507d000c010 clc 0(8,%r13),16(%r12) > 0000000000032c68: a784003c brc 8,32ce0 > >0000000000032c6c: d507d000c030 clc 0(8,%r13),48(%r12) > 0000000000032c72: b904002c lgr %r2,%r12 > 0000000000032c76: a7a90000 lghi %r10,0 > 0000000000032c7a: a7840021 brc 8,32cbc > 0000000000032c7e: c0e5ffffefe3 brasl %r14,30c44 > Call Trace: > ([<000000000075a300>] 0x75a300) > [<000000000037195a>] schedule+0x162/0x7f4 > [<000000000001a2be>] cpu_idle+0x1ca/0x25c > [<000000000036f368>] start_secondary+0xac/0xb8 > [<0000000000000000>] 0x0 > [<0000000000000000>] 0x0 > Last Breaking-Event-Address: > [<0000000000032cc6>] pick_next_task_fair+0x8e/0xb0 > <4>---[ end trace 9bb55df196feedcc ]--- > Kernel panic - not syncing: Attempted to kill the idle task! > > Please note that the above call trace is from s390, however Avi reported the > same bug on x86_64. > > I tried to bisect this and ended up somewhere at the beginning of 2.6.23 when > the CFS patches got merged. Unfortunately it got harder and harder to reproduce > so that I couldn't bisect this down to a single patch. > > One observation however is that this always happens after cpu_up(), not > cpu_down(). > > I modified the kernel sources a bit (actually only added a single "noinline") > to get some sensible debug data and dumped a crashed system. These are the > contents of the scheduler data structures which cause the crash: > > >> px *(cfs_rq *) 0x75a380 > struct cfs_rq { > load = struct load_weight { > weight = 0x800 > inv_weight = 0x0 > } > nr_running = 0x1 > exec_clock = 0x0 > min_vruntime = 0xbf7e9776 > tasks_timeline = struct rb_root { > rb_node = (nil) > } > rb_leftmost = (nil) <<<<<<<<<<<< shouldn't be NULL > tasks = struct list_head { > next = 0x759328 > prev = 0x759328 > } > balance_iterator = (nil) > curr = 0x759300 > next = (nil) > nr_spread_over = 0x0 > rq = 0x75a300 > leaf_cfs_rq_list = struct list_head { > next = (nil) > prev = (nil) > } > tg = 0x564970 > } Right, this cfs_rq is buggered. rb_leftmost may be null when the tree is empty (as is the case here). However cfs_rq->curr != NULL and cfs_rq->nr_running != 0. So this hints at a missing put_prev_entity() - we keep current out of the tree, and put it back in right before we schedule(). The advantage is that we don't need to reposition (dequeue/enqueue) curr in the tree every time we update its virtual timeline. So what races so that we can miss put_prev_entity() and how is cpu_up() special.. > The sched_entity that belongs to the cfs_rq: > > >> px *(sched_entity *) 0x759300 > struct sched_entity { > load = struct load_weight { > weight = 0x800 > inv_weight = 0x1ffc01 > } > run_node = struct rb_node { > rb_parent_color = 0x1 > rb_right = (nil) > rb_left = (nil) > } > group_node = struct list_head { > next = 0x75a3b8 > prev = 0x75a3b8 > } > on_rq = 0x1 > exec_start = 0x189685acb4aa46 > sum_exec_runtime = 0x188a2b84c > vruntime = 0xd036bd29 > prev_sum_exec_runtime = 0x1672e3f62 > last_wakeup = 0x0 > avg_overlap = 0x0 > parent = (nil) > cfs_rq = 0x75a380 > my_q = 0x759400 > } > > And the rq: > > >> px *(rq *) 0x75a300 > struct rq { > lock = spinlock_t { > raw_lock = raw_spinlock_t { > owner_cpu = 0xfffffffe > } > break_lock = 0x1 > magic = 0xdead4ead > owner_cpu = 0x1 > owner = 0x2ef95350 > } > nr_running = 0x1 > cpu_load = { > [0] 0x3062 > [1] 0x2bdf > [2] 0x20db > [3] 0x171e > [4] 0x1010 > } > idle_at_tick = 0x0 > last_tick_seen = 0x0 > in_nohz_recently = 0x0 > load = struct load_weight { > weight = 0xc31 > inv_weight = 0x0 > } > nr_load_updates = 0x95f > nr_switches = 0x3f68 > cfs = struct cfs_rq { > load = struct load_weight { > weight = 0x800 > inv_weight = 0x0 > } > nr_running = 0x1 > exec_clock = 0x0 > min_vruntime = 0xbf7e9776 > tasks_timeline = struct rb_root { > rb_node = (nil) > } > rb_leftmost = (nil) > tasks = struct list_head { > next = 0x759328 > prev = 0x759328 > } > balance_iterator = (nil) > curr = 0x759300 > next = (nil) > nr_spread_over = 0x0 > rq = 0x75a300 > leaf_cfs_rq_list = struct list_head { > next = (nil) > prev = (nil) > } > tg = 0x564970 > } > rt = struct rt_rq { > active = struct rt_prio_array { > bitmap = { > [0] 0x0 > [1] 0x1000000000 > } > queue = { > [0] struct list_head { > next = 0x75a418 > prev = 0x75a418 > } > [1] struct list_head { > next = 0x75a428 > prev = 0x75a428 > } > [2] struct list_head { > next = 0x75a438 > prev = 0x75a438 > } > [3] struct list_head { > next = 0x75a448 > prev = 0x75a448 > } > [4] struct list_head { > next = 0x75a458 > prev = 0x75a458 > } > [5] struct list_head { > next = 0x75a468 > prev = 0x75a468 > } > [6] struct list_head { > next = 0x75a478 > prev = 0x75a478 > } > [7] struct list_head { > next = 0x75a488 > prev = 0x75a488 > } > [8] struct list_head { > next = 0x75a498 > prev = 0x75a498 > } > [9] struct list_head { > next = 0x75a4a8 > prev = 0x75a4a8 > } > [10] struct list_head { > next = 0x75a4b8 > prev = 0x75a4b8 > } > [11] struct list_head { > next = 0x75a4c8 > prev = 0x75a4c8 > } > [12] struct list_head { > next = 0x75a4d8 > prev = 0x75a4d8 > } > [13] struct list_head { > next = 0x75a4e8 > prev = 0x75a4e8 > } > [14] struct list_head { > next = 0x75a4f8 > prev = 0x75a4f8 > } > [15] struct list_head { > next = 0x75a508 > prev = 0x75a508 > } > [16] struct list_head { > next = 0x75a518 > prev = 0x75a518 > } > [17] struct list_head { > next = 0x75a528 > prev = 0x75a528 > } > [18] struct list_head { > next = 0x75a538 > prev = 0x75a538 > } > [19] struct list_head { > next = 0x75a548 > prev = 0x75a548 > } > [20] struct list_head { > next = 0x75a558 > prev = 0x75a558 > } > [21] struct list_head { > next = 0x75a568 > prev = 0x75a568 > } > [22] struct list_head { > next = 0x75a578 > prev = 0x75a578 > } > [23] struct list_head { > next = 0x75a588 > prev = 0x75a588 > } > [24] struct list_head { > next = 0x75a598 > prev = 0x75a598 > } > [25] struct list_head { > next = 0x75a5a8 > prev = 0x75a5a8 > } > [26] struct list_head { > next = 0x75a5b8 > prev = 0x75a5b8 > } > [27] struct list_head { > next = 0x75a5c8 > prev = 0x75a5c8 > } > [28] struct list_head { > next = 0x75a5d8 > prev = 0x75a5d8 > } > [29] struct list_head { > next = 0x75a5e8 > prev = 0x75a5e8 > } > [30] struct list_head { > next = 0x75a5f8 > prev = 0x75a5f8 > } > [31] struct list_head { > next = 0x75a608 > prev = 0x75a608 > } > [32] struct list_head { > next = 0x75a618 > prev = 0x75a618 > } > [33] struct list_head { > next = 0x75a628 > prev = 0x75a628 > } > [34] struct list_head { > next = 0x75a638 > prev = 0x75a638 > } > [35] struct list_head { > next = 0x75a648 > prev = 0x75a648 > } > [36] struct list_head { > next = 0x75a658 > prev = 0x75a658 > } > [37] struct list_head { > next = 0x75a668 > prev = 0x75a668 > } > [38] struct list_head { > next = 0x75a678 > prev = 0x75a678 > } > [39] struct list_head { > next = 0x75a688 > prev = 0x75a688 > } > [40] struct list_head { > next = 0x75a698 > prev = 0x75a698 > } > [41] struct list_head { > next = 0x75a6a8 > prev = 0x75a6a8 > } > [42] struct list_head { > next = 0x75a6b8 > prev = 0x75a6b8 > } > [43] struct list_head { > next = 0x75a6c8 > prev = 0x75a6c8 > } > [44] struct list_head { > next = 0x75a6d8 > prev = 0x75a6d8 > } > [45] struct list_head { > next = 0x75a6e8 > prev = 0x75a6e8 > } > [46] struct list_head { > next = 0x75a6f8 > prev = 0x75a6f8 > } > [47] struct list_head { > next = 0x75a708 > prev = 0x75a708 > } > [48] struct list_head { > next = 0x75a718 > prev = 0x75a718 > } > [49] struct list_head { > next = 0x75a728 > prev = 0x75a728 > } > [50] struct list_head { > next = 0x75a738 > prev = 0x75a738 > } > [51] struct list_head { > next = 0x75a748 > prev = 0x75a748 > } > [52] struct list_head { > next = 0x75a758 > prev = 0x75a758 > } > [53] struct list_head { > next = 0x75a768 > prev = 0x75a768 > } > [54] struct list_head { > next = 0x75a778 > prev = 0x75a778 > } > [55] struct list_head { > next = 0x75a788 > prev = 0x75a788 > } > [56] struct list_head { > next = 0x75a798 > prev = 0x75a798 > } > [57] struct list_head { > next = 0x75a7a8 > prev = 0x75a7a8 > } > [58] struct list_head { > next = 0x75a7b8 > prev = 0x75a7b8 > } > [59] struct list_head { > next = 0x75a7c8 > prev = 0x75a7c8 > } > [60] struct list_head { > next = 0x75a7d8 > prev = 0x75a7d8 > } > [61] struct list_head { > next = 0x75a7e8 > prev = 0x75a7e8 > } > [62] struct list_head { > next = 0x75a7f8 > prev = 0x75a7f8 > } > [63] struct list_head { > next = 0x75a808 > prev = 0x75a808 > } > [64] struct list_head { > next = 0x75a818 > prev = 0x75a818 > } > [65] struct list_head { > next = 0x75a828 > prev = 0x75a828 > } > [66] struct list_head { > next = 0x75a838 > prev = 0x75a838 > } > [67] struct list_head { > next = 0x75a848 > prev = 0x75a848 > } > [68] struct list_head { > next = 0x75a858 > prev = 0x75a858 > } > [69] struct list_head { > next = 0x75a868 > prev = 0x75a868 > } > [70] struct list_head { > next = 0x75a878 > prev = 0x75a878 > } > [71] struct list_head { > next = 0x75a888 > prev = 0x75a888 > } > [72] struct list_head { > next = 0x75a898 > prev = 0x75a898 > } > [73] struct list_head { > next = 0x75a8a8 > prev = 0x75a8a8 > } > [74] struct list_head { > next = 0x75a8b8 > prev = 0x75a8b8 > } > [75] struct list_head { > next = 0x75a8c8 > prev = 0x75a8c8 > } > [76] struct list_head { > next = 0x75a8d8 > prev = 0x75a8d8 > } > [77] struct list_head { > next = 0x75a8e8 > prev = 0x75a8e8 > } > [78] struct list_head { > next = 0x75a8f8 > prev = 0x75a8f8 > } > [79] struct list_head { > next = 0x75a908 > prev = 0x75a908 > } > [80] struct list_head { > next = 0x75a918 > prev = 0x75a918 > } > [81] struct list_head { > next = 0x75a928 > prev = 0x75a928 > } > [82] struct list_head { > next = 0x75a938 > prev = 0x75a938 > } > [83] struct list_head { > next = 0x75a948 > prev = 0x75a948 > } > [84] struct list_head { > next = 0x75a958 > prev = 0x75a958 > } > [85] struct list_head { > next = 0x75a968 > prev = 0x75a968 > } > [86] struct list_head { > next = 0x75a978 > prev = 0x75a978 > } > [87] struct list_head { > next = 0x75a988 > prev = 0x75a988 > } > [88] struct list_head { > next = 0x75a998 > prev = 0x75a998 > } > [89] struct list_head { > next = 0x75a9a8 > prev = 0x75a9a8 > } > [90] struct list_head { > next = 0x75a9b8 > prev = 0x75a9b8 > } > [91] struct list_head { > next = 0x75a9c8 > prev = 0x75a9c8 > } > [92] struct list_head { > next = 0x75a9d8 > prev = 0x75a9d8 > } > [93] struct list_head { > next = 0x75a9e8 > prev = 0x75a9e8 > } > [94] struct list_head { > next = 0x75a9f8 > prev = 0x75a9f8 > } > [95] struct list_head { > next = 0x75aa08 > prev = 0x75aa08 > } > [96] struct list_head { > next = 0x75aa18 > prev = 0x75aa18 > } > [97] struct list_head { > next = 0x75aa28 > prev = 0x75aa28 > } > [98] struct list_head { > next = 0x75aa38 > prev = 0x75aa38 > } > [99] struct list_head { > next = 0x75aa48 > prev = 0x75aa48 > } > } > } > rt_nr_running = 0x0 > highest_prio = 0x64 > rt_nr_migratory = 0x0 > overloaded = 0x0 > rt_throttled = 0x0 > rt_time = 0x123a999 > rt_runtime = 0x389fd980 > rt_runtime_lock = spinlock_t { > raw_lock = raw_spinlock_t { > owner_cpu = 0x0 > } > break_lock = 0x0 > magic = 0xdead4ead > owner_cpu = 0xffffffff > owner = 0xffffffffffffffff > } > } > leaf_cfs_rq_list = struct list_head { > next = 0x2f5a8970 > prev = 0x759470 > } > nr_uninterruptible = 0xfffffffffffffffe > curr = 0x2ef95350 > idle = 0x2fe7ccf8 > next_balance = 0x10000093b > prev_mm = (nil) > clock = 0x189685acb4d536 > nr_iowait = atomic_t { > counter = 0x0 > } > rd = 0x564a58 > sd = (nil) > active_balance = 0x0 > push_cpu = 0x0 > cpu = 0x1 > migration_thread = 0x2ef95350 > migration_queue = struct list_head { > next = 0x75ab10 > prev = 0x75ab10 > } > rq_lock_key = struct lock_class_key { > } > } > > Hopefully all of this debug data is of any use. If you need more, just let me > know. > > Thanks! ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 18:05 ` Peter Zijlstra @ 2008-06-19 18:14 ` Peter Zijlstra 2008-06-19 21:14 ` Heiko Carstens 2008-06-19 21:17 ` Heiko Carstens ` (2 subsequent siblings) 3 siblings, 1 reply; 28+ messages in thread From: Peter Zijlstra @ 2008-06-19 18:14 UTC (permalink / raw) To: Heiko Carstens; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Thu, 2008-06-19 at 20:05 +0200, Peter Zijlstra wrote: > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > > The sched_entity that belongs to the cfs_rq: > > > > >> px *(sched_entity *) 0x759300 > > struct sched_entity { > > load = struct load_weight { > > weight = 0x800 > > inv_weight = 0x1ffc01 > > } > > run_node = struct rb_node { > > rb_parent_color = 0x1 > > rb_right = (nil) > > rb_left = (nil) > > } > > group_node = struct list_head { > > next = 0x75a3b8 > > prev = 0x75a3b8 > > } > > on_rq = 0x1 > > exec_start = 0x189685acb4aa46 > > sum_exec_runtime = 0x188a2b84c > > vruntime = 0xd036bd29 > > prev_sum_exec_runtime = 0x1672e3f62 > > last_wakeup = 0x0 > > avg_overlap = 0x0 > > parent = (nil) > > cfs_rq = 0x75a380 > > my_q = 0x759400 > > } Ooh, this thing is with CONFIG_GROUP_SCHED... does it still happen when you disable that? Not that that is any excuse for crashing.. but it does simplify the scheduler somewhat. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 18:14 ` Peter Zijlstra @ 2008-06-19 21:14 ` Heiko Carstens 2008-06-19 21:26 ` Peter Zijlstra 0 siblings, 1 reply; 28+ messages in thread From: Heiko Carstens @ 2008-06-19 21:14 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Thu, Jun 19, 2008 at 08:14:02PM +0200, Peter Zijlstra wrote: > On Thu, 2008-06-19 at 20:05 +0200, Peter Zijlstra wrote: > > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > > > > The sched_entity that belongs to the cfs_rq: > > > > > > >> px *(sched_entity *) 0x759300 > > > struct sched_entity { > > > load = struct load_weight { > > > weight = 0x800 > > > inv_weight = 0x1ffc01 > > > } > > > run_node = struct rb_node { > > > rb_parent_color = 0x1 > > > rb_right = (nil) > > > rb_left = (nil) > > > } > > > group_node = struct list_head { > > > next = 0x75a3b8 > > > prev = 0x75a3b8 > > > } > > > on_rq = 0x1 > > > exec_start = 0x189685acb4aa46 > > > sum_exec_runtime = 0x188a2b84c > > > vruntime = 0xd036bd29 > > > prev_sum_exec_runtime = 0x1672e3f62 > > > last_wakeup = 0x0 > > > avg_overlap = 0x0 > > > parent = (nil) > > > cfs_rq = 0x75a380 > > > my_q = 0x759400 > > > } > > Ooh, this thing is with CONFIG_GROUP_SCHED... does it still happen when > you disable that? Indeed, when CONFIG_GROUP_SCHED is disabled I cannot reproduce it anymore. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 21:14 ` Heiko Carstens @ 2008-06-19 21:26 ` Peter Zijlstra 0 siblings, 0 replies; 28+ messages in thread From: Peter Zijlstra @ 2008-06-19 21:26 UTC (permalink / raw) To: Heiko Carstens; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Thu, 2008-06-19 at 23:14 +0200, Heiko Carstens wrote: > On Thu, Jun 19, 2008 at 08:14:02PM +0200, Peter Zijlstra wrote: > > On Thu, 2008-06-19 at 20:05 +0200, Peter Zijlstra wrote: > > > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > > > > > > The sched_entity that belongs to the cfs_rq: > > > > > > > > >> px *(sched_entity *) 0x759300 > > > > struct sched_entity { > > > > load = struct load_weight { > > > > weight = 0x800 > > > > inv_weight = 0x1ffc01 > > > > } > > > > run_node = struct rb_node { > > > > rb_parent_color = 0x1 > > > > rb_right = (nil) > > > > rb_left = (nil) > > > > } > > > > group_node = struct list_head { > > > > next = 0x75a3b8 > > > > prev = 0x75a3b8 > > > > } > > > > on_rq = 0x1 > > > > exec_start = 0x189685acb4aa46 > > > > sum_exec_runtime = 0x188a2b84c > > > > vruntime = 0xd036bd29 > > > > prev_sum_exec_runtime = 0x1672e3f62 > > > > last_wakeup = 0x0 > > > > avg_overlap = 0x0 > > > > parent = (nil) > > > > cfs_rq = 0x75a380 > > > > my_q = 0x759400 > > > > } > > > > Ooh, this thing is with CONFIG_GROUP_SCHED... does it still happen when > > you disable that? > > Indeed, when CONFIG_GROUP_SCHED is disabled I cannot reproduce it anymore. Ok, that gives us some idea where to look, thanks for this data point. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 18:05 ` Peter Zijlstra 2008-06-19 18:14 ` Peter Zijlstra @ 2008-06-19 21:17 ` Heiko Carstens 2008-06-19 21:32 ` Peter Zijlstra 2008-06-20 11:44 ` Dmitry Adamushko 3 siblings, 0 replies; 28+ messages in thread From: Heiko Carstens @ 2008-06-19 21:17 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Thu, Jun 19, 2008 at 08:05:10PM +0200, Peter Zijlstra wrote: > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > > Hi Ingo, Peter, > > > > I'm still seeing kernel crashes on cpu hotplug with Linus' current git tree. > > All I have to do is to make all cpus busy (make -j4 of the kernel source is > > sufficient) and then start cpu hotplug stress. > > It usually takes below a minute to crash the system like this: > > > > Unable to handle kernel pointer dereference at virtual kernel address 005a800000031000 > > Oops: 0038 [#1] PREEMPT SMP > > Modules linked in: > > CPU: 1 Not tainted 2.6.26-rc6-00232-g9bedbcb #356 > > Process swapper (pid: 0, task: 000000002fe7ccf8, ksp: 000000002fe93d78) > > Krnl PSW : 0400e00180000000 0000000000032c6c (pick_next_task_fair+0x34/0xb0) > > I presume this is: > > se = pick_next_entity(cfs_rq); Yes, that is correct. Sorry, missed to tell about this detail. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 18:05 ` Peter Zijlstra 2008-06-19 18:14 ` Peter Zijlstra 2008-06-19 21:17 ` Heiko Carstens @ 2008-06-19 21:32 ` Peter Zijlstra 2008-06-19 21:49 ` Heiko Carstens 2008-06-20 11:44 ` Dmitry Adamushko 3 siblings, 1 reply; 28+ messages in thread From: Peter Zijlstra @ 2008-06-19 21:32 UTC (permalink / raw) To: Heiko Carstens; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Thu, 2008-06-19 at 20:05 +0200, Peter Zijlstra wrote: > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > > The sched_entity that belongs to the cfs_rq: > > > > >> px *(sched_entity *) 0x759300 > > struct sched_entity { > > load = struct load_weight { > > weight = 0x800 > > inv_weight = 0x1ffc01 > > } > > run_node = struct rb_node { > > rb_parent_color = 0x1 > > rb_right = (nil) > > rb_left = (nil) > > } > > group_node = struct list_head { > > next = 0x75a3b8 > > prev = 0x75a3b8 > > } > > on_rq = 0x1 > > exec_start = 0x189685acb4aa46 > > sum_exec_runtime = 0x188a2b84c > > vruntime = 0xd036bd29 > > prev_sum_exec_runtime = 0x1672e3f62 > > last_wakeup = 0x0 > > avg_overlap = 0x0 > > parent = (nil) > > cfs_rq = 0x75a380 > > my_q = 0x759400 > > } If you still have this dump, could you give the output of: px *(struct cfs_rq *) 0x759400 And possibly walk down the chain getting its curr and then my_q again etc.. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 21:32 ` Peter Zijlstra @ 2008-06-19 21:49 ` Heiko Carstens 2008-06-20 8:51 ` Peter Zijlstra 0 siblings, 1 reply; 28+ messages in thread From: Heiko Carstens @ 2008-06-19 21:49 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Thu, Jun 19, 2008 at 11:32:29PM +0200, Peter Zijlstra wrote: > On Thu, 2008-06-19 at 20:05 +0200, Peter Zijlstra wrote: > > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > > > > The sched_entity that belongs to the cfs_rq: > > > > > > >> px *(sched_entity *) 0x759300 > > > struct sched_entity { > > > load = struct load_weight { > > > weight = 0x800 > > > inv_weight = 0x1ffc01 > > > } > > > run_node = struct rb_node { > > > rb_parent_color = 0x1 > > > rb_right = (nil) > > > rb_left = (nil) > > > } > > > group_node = struct list_head { > > > next = 0x75a3b8 > > > prev = 0x75a3b8 > > > } > > > on_rq = 0x1 > > > exec_start = 0x189685acb4aa46 > > > sum_exec_runtime = 0x188a2b84c > > > vruntime = 0xd036bd29 > > > prev_sum_exec_runtime = 0x1672e3f62 > > > last_wakeup = 0x0 > > > avg_overlap = 0x0 > > > parent = (nil) > > > cfs_rq = 0x75a380 > > > my_q = 0x759400 > > > } > > If you still have this dump, could you give the output of: > > px *(struct cfs_rq *) 0x759400 > > And possibly walk down the chain getting its curr and then my_q again > etc.. Sure, fortunately just a very short chain: >> px *(struct cfs_rq *) 0x759400 struct cfs_rq { load = struct load_weight { weight = 0xc31 inv_weight = 0x0 } nr_running = 0x1 exec_clock = 0x0 min_vruntime = 0x4f216b005 tasks_timeline = struct rb_root { rb_node = 0x2fca4d40 } rb_leftmost = 0x2fca4d40 tasks = struct list_head { next = 0x2fca4d58 prev = 0x2fca4d58 } balance_iterator = 0x2e29e700 curr = 0x2ef4f388 next = (nil) nr_spread_over = 0x0 rq = 0x75a300 leaf_cfs_rq_list = struct list_head { next = 0x75aaa0 prev = 0x2e1eca70 } tg = 0x564910 } >> px *(sched_entity *) 0x2ef4f388 struct sched_entity { load = struct load_weight { weight = 0x400 inv_weight = 0x400000 } run_node = struct rb_node { rb_parent_color = 0x2f07b399 rb_right = (nil) rb_left = (nil) } group_node = struct list_head { next = 0x2ef4f3b0 prev = 0x2ef4f3b0 } on_rq = 0x0 exec_start = 0x189685c9a77b96 sum_exec_runtime = 0x3c51111 vruntime = 0x493becf68 prev_sum_exec_runtime = 0x3c50997 last_wakeup = 0x0 avg_overlap = 0x4b67d1 parent = 0x763300 cfs_rq = 0x763400 my_q = (nil) } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 21:49 ` Heiko Carstens @ 2008-06-20 8:51 ` Peter Zijlstra 2008-06-20 22:19 ` Heiko Carstens 0 siblings, 1 reply; 28+ messages in thread From: Peter Zijlstra @ 2008-06-20 8:51 UTC (permalink / raw) To: Heiko Carstens; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Thu, 2008-06-19 at 23:49 +0200, Heiko Carstens wrote: > On Thu, Jun 19, 2008 at 11:32:29PM +0200, Peter Zijlstra wrote: > > On Thu, 2008-06-19 at 20:05 +0200, Peter Zijlstra wrote: > > > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > > > > > > The sched_entity that belongs to the cfs_rq: > > > > > > > > >> px *(sched_entity *) 0x759300 > > > > struct sched_entity { > > > > load = struct load_weight { > > > > weight = 0x800 > > > > inv_weight = 0x1ffc01 > > > > } > > > > run_node = struct rb_node { > > > > rb_parent_color = 0x1 > > > > rb_right = (nil) > > > > rb_left = (nil) > > > > } > > > > group_node = struct list_head { > > > > next = 0x75a3b8 > > > > prev = 0x75a3b8 > > > > } > > > > on_rq = 0x1 > > > > exec_start = 0x189685acb4aa46 > > > > sum_exec_runtime = 0x188a2b84c > > > > vruntime = 0xd036bd29 > > > > prev_sum_exec_runtime = 0x1672e3f62 > > > > last_wakeup = 0x0 > > > > avg_overlap = 0x0 > > > > parent = (nil) > > > > cfs_rq = 0x75a380 > > > > my_q = 0x759400 > > > > } > > > > If you still have this dump, could you give the output of: > > > > px *(struct cfs_rq *) 0x759400 > > > > And possibly walk down the chain getting its curr and then my_q again > > etc.. > > Sure, fortunately just a very short chain: > > >> px *(struct cfs_rq *) 0x759400 > struct cfs_rq { > load = struct load_weight { > weight = 0xc31 > inv_weight = 0x0 > } > nr_running = 0x1 > exec_clock = 0x0 > min_vruntime = 0x4f216b005 > tasks_timeline = struct rb_root { > rb_node = 0x2fca4d40 > } > rb_leftmost = 0x2fca4d40 > tasks = struct list_head { > next = 0x2fca4d58 > prev = 0x2fca4d58 > } > balance_iterator = 0x2e29e700 > curr = 0x2ef4f388 > next = (nil) > nr_spread_over = 0x0 > rq = 0x75a300 > leaf_cfs_rq_list = struct list_head { > next = 0x75aaa0 > prev = 0x2e1eca70 > } > tg = 0x564910 > } Hmm this one is buggered as well, it has nr_running = 1, and one entry in the tree, but also a !NULL curr. Could you please show: px *container_of(0x2fca4d40, struct sched_entity, run_node) which one might have to write like: px *((struct sched_entity *)((char*)0x2fca4d40) - ((unsigned long)&(((struct sched_entity *)0)->run_node))) /me prays he got the braces right,.. > >> px *(sched_entity *) 0x2ef4f388 > struct sched_entity { > load = struct load_weight { > weight = 0x400 > inv_weight = 0x400000 > } > run_node = struct rb_node { > rb_parent_color = 0x2f07b399 > rb_right = (nil) > rb_left = (nil) > } > group_node = struct list_head { > next = 0x2ef4f3b0 > prev = 0x2ef4f3b0 > } > on_rq = 0x0 > exec_start = 0x189685c9a77b96 > sum_exec_runtime = 0x3c51111 > vruntime = 0x493becf68 > prev_sum_exec_runtime = 0x3c50997 > last_wakeup = 0x0 > avg_overlap = 0x4b67d1 > parent = 0x763300 > cfs_rq = 0x763400 > my_q = (nil) > } This one seems un-associated with the rest of the chain, as per its back-pointers. Fancy puzzle,.. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-20 8:51 ` Peter Zijlstra @ 2008-06-20 22:19 ` Heiko Carstens 0 siblings, 0 replies; 28+ messages in thread From: Heiko Carstens @ 2008-06-20 22:19 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, Avi Kivity, linux-kernel, Dmitry Adamushko On Fri, Jun 20, 2008 at 10:51:03AM +0200, Peter Zijlstra wrote: > On Thu, 2008-06-19 at 23:49 +0200, Heiko Carstens wrote: > > On Thu, Jun 19, 2008 at 11:32:29PM +0200, Peter Zijlstra wrote: > > > On Thu, 2008-06-19 at 20:05 +0200, Peter Zijlstra wrote: > > > > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: > > > > > > > > The sched_entity that belongs to the cfs_rq: > > > > > > > > > > >> px *(sched_entity *) 0x759300 > > > > > struct sched_entity { > > > > > load = struct load_weight { > > > > > weight = 0x800 > > > > > inv_weight = 0x1ffc01 > > > > > } > > > > > run_node = struct rb_node { > > > > > rb_parent_color = 0x1 > > > > > rb_right = (nil) > > > > > rb_left = (nil) > > > > > } > > > > > group_node = struct list_head { > > > > > next = 0x75a3b8 > > > > > prev = 0x75a3b8 > > > > > } > > > > > on_rq = 0x1 > > > > > exec_start = 0x189685acb4aa46 > > > > > sum_exec_runtime = 0x188a2b84c > > > > > vruntime = 0xd036bd29 > > > > > prev_sum_exec_runtime = 0x1672e3f62 > > > > > last_wakeup = 0x0 > > > > > avg_overlap = 0x0 > > > > > parent = (nil) > > > > > cfs_rq = 0x75a380 > > > > > my_q = 0x759400 > > > > > } > > > > > > If you still have this dump, could you give the output of: > > > > > > px *(struct cfs_rq *) 0x759400 > > > > > > And possibly walk down the chain getting its curr and then my_q again > > > etc.. > > > > Sure, fortunately just a very short chain: > > > > >> px *(struct cfs_rq *) 0x759400 > > struct cfs_rq { > > load = struct load_weight { > > weight = 0xc31 > > inv_weight = 0x0 > > } > > nr_running = 0x1 > > exec_clock = 0x0 > > min_vruntime = 0x4f216b005 > > tasks_timeline = struct rb_root { > > rb_node = 0x2fca4d40 > > } > > rb_leftmost = 0x2fca4d40 > > tasks = struct list_head { > > next = 0x2fca4d58 > > prev = 0x2fca4d58 > > } > > balance_iterator = 0x2e29e700 > > curr = 0x2ef4f388 > > next = (nil) > > nr_spread_over = 0x0 > > rq = 0x75a300 > > leaf_cfs_rq_list = struct list_head { > > next = 0x75aaa0 > > prev = 0x2e1eca70 > > } > > tg = 0x564910 > > } > > Hmm this one is buggered as well, it has nr_running = 1, and one entry > in the tree, but also a !NULL curr. > > Could you please show: > > px *container_of(0x2fca4d40, struct sched_entity, run_node) > > which one might have to write like: > > px *((struct sched_entity *)((char*)0x2fca4d40) - ((unsigned long)&(((struct sched_entity *)0)->run_node))) > > /me prays he got the braces right,.. Here we go: >> offset sched_entity.run_node Offset: 16 bytes. >> px *(sched_entity *) 0x2fca4d30 struct sched_entity { load = struct load_weight { weight = 0xc31 inv_weight = 0x14ff97 } run_node = struct rb_node { rb_parent_color = 0x1 rb_right = (nil) rb_left = (nil) } group_node = struct list_head { next = 0x759438 prev = 0x759438 } on_rq = 0x1 exec_start = 0x1896859fb4ff76 sum_exec_runtime = 0x1f19 vruntime = 0x4f128ead9 prev_sum_exec_runtime = 0x0 last_wakeup = 0x0 avg_overlap = 0x0 parent = 0x759300 cfs_rq = 0x759400 my_q = (nil) } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 18:05 ` Peter Zijlstra ` (2 preceding siblings ...) 2008-06-19 21:32 ` Peter Zijlstra @ 2008-06-20 11:44 ` Dmitry Adamushko 2008-06-20 22:23 ` Heiko Carstens 3 siblings, 1 reply; 28+ messages in thread From: Dmitry Adamushko @ 2008-06-20 11:44 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Heiko Carstens, Ingo Molnar, Avi Kivity, linux-kernel 2008/6/19 Peter Zijlstra <a.p.zijlstra@chello.nl>: > On Thu, 2008-06-19 at 18:19 +0200, Heiko Carstens wrote: >> Hi Ingo, Peter, >> >> I'm still seeing kernel crashes on cpu hotplug with Linus' current git tree. >> All I have to do is to make all cpus busy (make -j4 of the kernel source is >> sufficient) and then start cpu hotplug stress. >> It usually takes below a minute to crash the system like this: >> >> Unable to handle kernel pointer dereference at virtual kernel address 005a800000031000 >> Oops: 0038 [#1] PREEMPT SMP >> Modules linked in: >> CPU: 1 Not tainted 2.6.26-rc6-00232-g9bedbcb #356 >> Process swapper (pid: 0, task: 000000002fe7ccf8, ksp: 000000002fe93d78) >> Krnl PSW : 0400e00180000000 0000000000032c6c (pick_next_task_fair+0x34/0xb0) > > I presume this is: > > se = pick_next_entity(cfs_rq); > >> R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:2 PM:0 EA:3 >> Krnl GPRS: 00000000001ff000 0000000000030bd8 000000000075a380 000000002fe7ccf8 >> 0000000000386690 0000000000000008 0000000000000000 000000002fe7cf58 >> 0000000000000001 000000000075a300 0000000000000000 000000002fe93d40 >> 005a800000031201 0000000000386010 000000002fe93d78 000000002fe93d40 >> Krnl Code: 0000000000032c5c: e3e0f0980024 stg %r14,152(%r15) >> 0000000000032c62: d507d000c010 clc 0(8,%r13),16(%r12) >> 0000000000032c68: a784003c brc 8,32ce0 >> >0000000000032c6c: d507d000c030 clc 0(8,%r13),48(%r12) >> 0000000000032c72: b904002c lgr %r2,%r12 >> 0000000000032c76: a7a90000 lghi %r10,0 >> 0000000000032c7a: a7840021 brc 8,32cbc >> 0000000000032c7e: c0e5ffffefe3 brasl %r14,30c44 >> Call Trace: >> ([<000000000075a300>] 0x75a300) >> [<000000000037195a>] schedule+0x162/0x7f4 >> [<000000000001a2be>] cpu_idle+0x1ca/0x25c >> [<000000000036f368>] start_secondary+0xac/0xb8 >> [<0000000000000000>] 0x0 >> [<0000000000000000>] 0x0 >> Last Breaking-Event-Address: >> [<0000000000032cc6>] pick_next_task_fair+0x8e/0xb0 >> <4>---[ end trace 9bb55df196feedcc ]--- >> Kernel panic - not syncing: Attempted to kill the idle task! >> >> Please note that the above call trace is from s390, however Avi reported the >> same bug on x86_64. >> >> I tried to bisect this and ended up somewhere at the beginning of 2.6.23 when >> the CFS patches got merged. Unfortunately it got harder and harder to reproduce >> so that I couldn't bisect this down to a single patch. >> >> One observation however is that this always happens after cpu_up(), not >> cpu_down(). >> >> I modified the kernel sources a bit (actually only added a single "noinline") >> to get some sensible debug data and dumped a crashed system. These are the >> contents of the scheduler data structures which cause the crash: >> >> >> px *(cfs_rq *) 0x75a380 >> struct cfs_rq { >> load = struct load_weight { >> weight = 0x800 >> inv_weight = 0x0 >> } >> nr_running = 0x1 >> exec_clock = 0x0 >> min_vruntime = 0xbf7e9776 >> tasks_timeline = struct rb_root { >> rb_node = (nil) >> } >> rb_leftmost = (nil) <<<<<<<<<<<< shouldn't be NULL >> tasks = struct list_head { >> next = 0x759328 >> prev = 0x759328 >> } >> balance_iterator = (nil) >> curr = 0x759300 >> next = (nil) >> nr_spread_over = 0x0 >> rq = 0x75a300 >> leaf_cfs_rq_list = struct list_head { >> next = (nil) >> prev = (nil) >> } >> tg = 0x564970 >> } > > Right, this cfs_rq is buggered. rb_leftmost may be null when the tree is > empty (as is the case here). > > However cfs_rq->curr != NULL and cfs_rq->nr_running != 0. > > So this hints at a missing put_prev_entity() - we keep current out of > the tree, and put it back in right before we schedule(). The advantage > is that we don't need to reposition (dequeue/enqueue) curr in the tree > every time we update its virtual timeline. > > So what races so that we can miss put_prev_entity() and how is cpu_up() > special.. > hum, I'd rather suppose that something weird happened at the time of cpu_down() and some per-cpu data is already inconsistent by the time of cpu_up(). Is it with CONFIG_USER_SCHED? Maybe we can write a small function that does a 'sanety' check of : for all sched_groups (task_groups's) : check 'sanity' of group->cfs_rq[CPU] and group->se[CPU] somewhere early in cpu_up(). So we can verify whether it's legacy of cpu_down() or something related to cpu_up(). hm? -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-20 11:44 ` Dmitry Adamushko @ 2008-06-20 22:23 ` Heiko Carstens 0 siblings, 0 replies; 28+ messages in thread From: Heiko Carstens @ 2008-06-20 22:23 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: Peter Zijlstra, Ingo Molnar, Avi Kivity, linux-kernel On Fri, Jun 20, 2008 at 01:44:41PM +0200, Dmitry Adamushko wrote: > 2008/6/19 Peter Zijlstra <a.p.zijlstra@chello.nl>: > > Right, this cfs_rq is buggered. rb_leftmost may be null when the tree is > > empty (as is the case here). > > > > However cfs_rq->curr != NULL and cfs_rq->nr_running != 0. > > > > So this hints at a missing put_prev_entity() - we keep current out of > > the tree, and put it back in right before we schedule(). The advantage > > is that we don't need to reposition (dequeue/enqueue) curr in the tree > > every time we update its virtual timeline. > > > > So what races so that we can miss put_prev_entity() and how is cpu_up() > > special.. > > > > hum, I'd rather suppose that something weird happened at the time of > cpu_down() and some per-cpu data is already inconsistent by the time > of cpu_up(). > > Is it with CONFIG_USER_SCHED? Yes. For full config see below. > Maybe we can write a small function that does a 'sanety' check of : > > for all sched_groups (task_groups's) : check 'sanity' of > group->cfs_rq[CPU] and group->se[CPU] somewhere early in cpu_up(). > > So we can verify whether it's legacy of cpu_down() or something > related to cpu_up(). > > hm? If you have a patch at hand, I'll give it a try. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.26-rc6 # Sat Jun 21 00:20:36 2008 # CONFIG_SCHED_MC=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_BUG=y CONFIG_NO_IOMEM=y CONFIG_NO_DMA=y CONFIG_GENERIC_LOCKBREAK=y CONFIG_PGSTE=y CONFIG_S390=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set CONFIG_AUDIT=y # CONFIG_AUDITSYSCALL is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set CONFIG_CGROUP_NS=y # CONFIG_CGROUP_DEVICE is not set # CONFIG_CPUSETS is not set CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y # CONFIG_RT_GROUP_SCHED is not set CONFIG_USER_SCHED=y # CONFIG_CGROUP_SCHED is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_RESOURCE_COUNTERS is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_SYSFS_DEPRECATED_V2=y # CONFIG_RELAY is not set CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_SYSCTL_SYSCALL=y CONFIG_SYSCTL_SYSCALL_CHECK=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y # CONFIG_COMPAT_BRK is not set CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set # CONFIG_PROFILING is not set # CONFIG_MARKERS is not set CONFIG_HAVE_OPROFILE=y CONFIG_KPROBES=y CONFIG_KRETPROBES=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y # CONFIG_HAVE_DMA_ATTRS is not set CONFIG_PROC_PAGE_MONITOR=y CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_FORCE_LOAD is not set CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_MODVERSIONS=y # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_BLK_DEV_IO_TRACE is not set CONFIG_BLK_DEV_BSG=y CONFIG_BLOCK_COMPAT=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set CONFIG_DEFAULT_DEADLINE=y # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="deadline" CONFIG_PREEMPT_NOTIFIERS=y CONFIG_CLASSIC_RCU=y # # Base setup # # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_64BIT=y CONFIG_SMP=y CONFIG_NR_CPUS=32 CONFIG_HOTPLUG_CPU=y CONFIG_COMPAT=y CONFIG_SYSVIPC_COMPAT=y CONFIG_AUDIT_ARCH=y CONFIG_S390_SWITCH_AMODE=y CONFIG_S390_EXEC_PROTECT=y # # Code generation options # # CONFIG_MARCH_G5 is not set CONFIG_MARCH_Z900=y # CONFIG_MARCH_Z990 is not set # CONFIG_MARCH_Z9_109 is not set CONFIG_PACK_STACK=y # CONFIG_SMALL_STACK is not set CONFIG_CHECK_STACK=y CONFIG_STACK_GUARD=256 # CONFIG_WARN_STACK is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # # Kernel preemption # # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y # CONFIG_PREEMPT_RCU is not set CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_DEFAULT=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set # CONFIG_DISCONTIGMEM_MANUAL is not set CONFIG_SPARSEMEM_MANUAL=y CONFIG_SPARSEMEM=y CONFIG_HAVE_MEMORY_PRESENT=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPARSEMEM_EXTREME=y CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPARSEMEM_VMEMMAP=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_RESOURCES_64BIT=y CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y # # I/O subsystem configuration # CONFIG_MACHCHK_WARNING=y CONFIG_QDIO=y # CONFIG_QDIO_DEBUG is not set # # Misc # CONFIG_IPL=y # CONFIG_IPL_TAPE is not set CONFIG_IPL_VM=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=m CONFIG_FORCE_MAX_ZONEORDER=9 # CONFIG_PROCESS_DEBUG is not set CONFIG_PFAULT=y # CONFIG_SHARED_KERNEL is not set # CONFIG_CMM is not set # CONFIG_PAGE_STATES is not set CONFIG_VIRT_TIMER=y CONFIG_VIRT_CPU_ACCOUNTING=y # CONFIG_APPLDATA_BASE is not set CONFIG_HZ_100=y # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 # CONFIG_SCHED_HRTICK is not set CONFIG_S390_HYPFS_FS=y CONFIG_KEXEC=y # CONFIG_ZFCPDUMP is not set CONFIG_S390_GUEST=y # # Networking # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y # CONFIG_PACKET_MMAP is not set CONFIG_UNIX=y CONFIG_XFRM=y # CONFIG_XFRM_USER is not set # CONFIG_XFRM_SUB_POLICY is not set # CONFIG_XFRM_MIGRATE is not set # CONFIG_XFRM_STATISTICS is not set CONFIG_NET_KEY=y # CONFIG_NET_KEY_MIGRATE is not set CONFIG_IUCV=m CONFIG_AFIUCV=m CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set CONFIG_IP_FIB_HASH=y # CONFIG_IP_PNP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set # CONFIG_SYN_COOKIES is not set # CONFIG_INET_AH is not set # CONFIG_INET_ESP is not set # CONFIG_INET_IPCOMP is not set # CONFIG_INET_XFRM_TUNNEL is not set CONFIG_INET_TUNNEL=y CONFIG_INET_XFRM_MODE_TRANSPORT=y CONFIG_INET_XFRM_MODE_TUNNEL=y CONFIG_INET_XFRM_MODE_BEET=y CONFIG_INET_LRO=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y # CONFIG_TCP_CONG_ADVANCED is not set CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG="cubic" # CONFIG_TCP_MD5SIG is not set # CONFIG_IP_VS is not set CONFIG_IPV6=y # CONFIG_IPV6_PRIVACY is not set # CONFIG_IPV6_ROUTER_PREF is not set # CONFIG_IPV6_OPTIMISTIC_DAD is not set # CONFIG_INET6_AH is not set # CONFIG_INET6_ESP is not set # CONFIG_INET6_IPCOMP is not set # CONFIG_IPV6_MIP6 is not set # CONFIG_INET6_XFRM_TUNNEL is not set # CONFIG_INET6_TUNNEL is not set CONFIG_INET6_XFRM_MODE_TRANSPORT=y CONFIG_INET6_XFRM_MODE_TUNNEL=y CONFIG_INET6_XFRM_MODE_BEET=y # CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set CONFIG_IPV6_SIT=y CONFIG_IPV6_NDISC_NODETYPE=y # CONFIG_IPV6_TUNNEL is not set # CONFIG_IPV6_MULTIPLE_TABLES is not set # CONFIG_IPV6_MROUTE is not set # CONFIG_NETWORK_SECMARK is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set CONFIG_NETFILTER_ADVANCED=y # # Core Netfilter Configuration # CONFIG_NETFILTER_NETLINK=m CONFIG_NETFILTER_NETLINK_QUEUE=m CONFIG_NETFILTER_NETLINK_LOG=m CONFIG_NF_CONNTRACK=m # CONFIG_NF_CT_ACCT is not set # CONFIG_NF_CONNTRACK_MARK is not set # CONFIG_NF_CONNTRACK_EVENTS is not set # CONFIG_NF_CT_PROTO_DCCP is not set # CONFIG_NF_CT_PROTO_SCTP is not set # CONFIG_NF_CT_PROTO_UDPLITE is not set # CONFIG_NF_CONNTRACK_AMANDA is not set # CONFIG_NF_CONNTRACK_FTP is not set # CONFIG_NF_CONNTRACK_H323 is not set # CONFIG_NF_CONNTRACK_IRC is not set # CONFIG_NF_CONNTRACK_NETBIOS_NS is not set # CONFIG_NF_CONNTRACK_PPTP is not set # CONFIG_NF_CONNTRACK_SANE is not set # CONFIG_NF_CONNTRACK_SIP is not set # CONFIG_NF_CONNTRACK_TFTP is not set # CONFIG_NF_CT_NETLINK is not set # CONFIG_NETFILTER_XTABLES is not set # # IP: Netfilter Configuration # # CONFIG_NF_CONNTRACK_IPV4 is not set # CONFIG_IP_NF_QUEUE is not set # CONFIG_IP_NF_IPTABLES is not set # CONFIG_IP_NF_ARPTABLES is not set # # IPv6: Netfilter Configuration # # CONFIG_NF_CONNTRACK_IPV6 is not set # CONFIG_IP6_NF_QUEUE is not set # CONFIG_IP6_NF_IPTABLES is not set # CONFIG_IP_DCCP is not set CONFIG_IP_SCTP=m # CONFIG_SCTP_DBG_MSG is not set # CONFIG_SCTP_DBG_OBJCNT is not set # CONFIG_SCTP_HMAC_NONE is not set # CONFIG_SCTP_HMAC_SHA1 is not set CONFIG_SCTP_HMAC_MD5=y # CONFIG_TIPC is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set CONFIG_NET_SCHED=y # # Queueing/Scheduling # CONFIG_NET_SCH_CBQ=m # CONFIG_NET_SCH_HTB is not set # CONFIG_NET_SCH_HFSC is not set CONFIG_NET_SCH_PRIO=m CONFIG_NET_SCH_RR=m CONFIG_NET_SCH_RED=m CONFIG_NET_SCH_SFQ=m CONFIG_NET_SCH_TEQL=m CONFIG_NET_SCH_TBF=m CONFIG_NET_SCH_GRED=m CONFIG_NET_SCH_DSMARK=m # CONFIG_NET_SCH_NETEM is not set # CONFIG_NET_SCH_INGRESS is not set # # Classification # CONFIG_NET_CLS=y # CONFIG_NET_CLS_BASIC is not set CONFIG_NET_CLS_TCINDEX=m CONFIG_NET_CLS_ROUTE4=m CONFIG_NET_CLS_ROUTE=y CONFIG_NET_CLS_FW=m CONFIG_NET_CLS_U32=m # CONFIG_CLS_U32_PERF is not set CONFIG_CLS_U32_MARK=y CONFIG_NET_CLS_RSVP=m CONFIG_NET_CLS_RSVP6=m CONFIG_NET_CLS_FLOW=m # CONFIG_NET_EMATCH is not set CONFIG_NET_CLS_ACT=y CONFIG_NET_ACT_POLICE=y # CONFIG_NET_ACT_GACT is not set # CONFIG_NET_ACT_MIRRED is not set CONFIG_NET_ACT_NAT=m # CONFIG_NET_ACT_PEDIT is not set # CONFIG_NET_ACT_SIMP is not set # CONFIG_NET_CLS_IND is not set CONFIG_NET_SCH_FIFO=y # # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_NET_TCPPROBE is not set CONFIG_CAN=m CONFIG_CAN_RAW=m CONFIG_CAN_BCM=m # # CAN Device Drivers # CONFIG_CAN_VCAN=m # CONFIG_CAN_DEBUG_DEVICES is not set # CONFIG_AF_RXRPC is not set # CONFIG_RFKILL is not set # CONFIG_NET_9P is not set # CONFIG_PCMCIA is not set CONFIG_CCW=y # # Device Drivers # # # Generic Driver Options # CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y # CONFIG_FW_LOADER is not set # CONFIG_DEBUG_DRIVER is not set # CONFIG_DEBUG_DEVRES is not set CONFIG_SYS_HYPERVISOR=y # CONFIG_CONNECTOR is not set CONFIG_BLK_DEV=y # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=m # CONFIG_BLK_DEV_CRYPTOLOOP is not set CONFIG_BLK_DEV_NBD=m CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_BLK_DEV_XIP=y # CONFIG_CDROM_PKTCDVD is not set # CONFIG_ATA_OVER_ETH is not set # # S/390 block device drivers # CONFIG_BLK_DEV_XPRAM=m # CONFIG_DCSSBLK is not set CONFIG_DASD=y CONFIG_DASD_PROFILE=y CONFIG_DASD_ECKD=y CONFIG_DASD_FBA=y CONFIG_DASD_DIAG=y CONFIG_DASD_EER=y CONFIG_VIRTIO_BLK=m CONFIG_MISC_DEVICES=y # CONFIG_EEPROM_93CX6 is not set # CONFIG_ENCLOSURE_SERVICES is not set # CONFIG_HAVE_IDE is not set # # SCSI device support # # CONFIG_RAID_ATTRS is not set CONFIG_SCSI=y # CONFIG_SCSI_DMA is not set # CONFIG_SCSI_TGT is not set CONFIG_SCSI_NETLINK=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_ST=y # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y # CONFIG_CHR_DEV_SCH is not set # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_LOGGING=y CONFIG_SCSI_SCAN_ASYNC=y CONFIG_SCSI_WAIT_SCAN=m # # SCSI Transports # # CONFIG_SCSI_SPI_ATTRS is not set CONFIG_SCSI_FC_ATTRS=y # CONFIG_SCSI_ISCSI_ATTRS is not set # CONFIG_SCSI_SAS_ATTRS is not set # CONFIG_SCSI_SAS_LIBSAS is not set # CONFIG_SCSI_SRP_ATTRS is not set CONFIG_SCSI_LOWLEVEL=y # CONFIG_ISCSI_TCP is not set # CONFIG_SCSI_DEBUG is not set CONFIG_ZFCP=y CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=m CONFIG_MD_RAID0=m CONFIG_MD_RAID1=m # CONFIG_MD_RAID10 is not set # CONFIG_MD_RAID456 is not set CONFIG_MD_MULTIPATH=m # CONFIG_MD_FAULTY is not set CONFIG_BLK_DEV_DM=y # CONFIG_DM_DEBUG is not set CONFIG_DM_CRYPT=y CONFIG_DM_SNAPSHOT=y CONFIG_DM_MIRROR=y CONFIG_DM_ZERO=y CONFIG_DM_MULTIPATH=y # CONFIG_DM_MULTIPATH_EMC is not set # CONFIG_DM_MULTIPATH_RDAC is not set # CONFIG_DM_MULTIPATH_HP is not set # CONFIG_DM_DELAY is not set # CONFIG_DM_UEVENT is not set CONFIG_NETDEVICES=y # CONFIG_NETDEVICES_MULTIQUEUE is not set # CONFIG_IFB is not set CONFIG_DUMMY=m CONFIG_BONDING=m # CONFIG_MACVLAN is not set CONFIG_EQUALIZER=m CONFIG_TUN=m CONFIG_VETH=m CONFIG_NET_ETHERNET=y # CONFIG_MII is not set # CONFIG_IBM_NEW_EMAC_ZMII is not set # CONFIG_IBM_NEW_EMAC_RGMII is not set # CONFIG_IBM_NEW_EMAC_TAH is not set # CONFIG_IBM_NEW_EMAC_EMAC4 is not set CONFIG_NETDEV_1000=y # CONFIG_E1000E_ENABLED is not set CONFIG_NETDEV_10000=y # CONFIG_TR is not set # CONFIG_WAN is not set # # S/390 network device drivers # CONFIG_LCS=m CONFIG_CTCM=m # CONFIG_NETIUCV is not set # CONFIG_SMSGIUCV is not set # CONFIG_CLAW is not set CONFIG_QETH=y CONFIG_QETH_L2=y CONFIG_QETH_L3=y CONFIG_QETH_IPV6=y CONFIG_CCWGROUP=y # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_NETCONSOLE is not set # CONFIG_NETPOLL is not set # CONFIG_NET_POLL_CONTROLLER is not set CONFIG_VIRTIO_NET=m # # Character devices # CONFIG_DEVKMEM=y CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 CONFIG_HW_RANDOM=m # CONFIG_HW_RANDOM_VIRTIO is not set # CONFIG_R3964 is not set CONFIG_RAW_DRIVER=m CONFIG_MAX_RAW_DEVS=256 # CONFIG_HANGCHECK_TIMER is not set # # S/390 character device drivers # CONFIG_TN3270=y CONFIG_TN3270_TTY=y CONFIG_TN3270_FS=m CONFIG_TN3270_CONSOLE=y CONFIG_TN3215=y CONFIG_TN3215_CONSOLE=y CONFIG_CCW_CONSOLE=y CONFIG_SCLP_TTY=y CONFIG_SCLP_CONSOLE=y CONFIG_SCLP_VT220_TTY=y CONFIG_SCLP_VT220_CONSOLE=y CONFIG_SCLP_CPI=m CONFIG_S390_TAPE=m # # S/390 tape interface support # CONFIG_S390_TAPE_BLOCK=y # # S/390 tape hardware support # CONFIG_S390_TAPE_34XX=m # CONFIG_S390_TAPE_3590 is not set # CONFIG_VMLOGRDR is not set # CONFIG_VMCP is not set # CONFIG_MONREADER is not set CONFIG_MONWRITER=m CONFIG_S390_VMUR=m # CONFIG_POWER_SUPPLY is not set # CONFIG_THERMAL is not set # CONFIG_WATCHDOG is not set # # Sonics Silicon Backplane # # CONFIG_MEMSTICK is not set # CONFIG_NEW_LEDS is not set CONFIG_ACCESSIBILITY=y # # File systems # CONFIG_EXT2_FS=y # CONFIG_EXT2_FS_XATTR is not set # CONFIG_EXT2_FS_XIP is not set CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y # CONFIG_EXT3_FS_POSIX_ACL is not set # CONFIG_EXT3_FS_SECURITY is not set # CONFIG_EXT4DEV_FS is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y # CONFIG_XFS_FS is not set # CONFIG_GFS2_FS is not set # CONFIG_OCFS2_FS is not set CONFIG_DNOTIFY=y CONFIG_INOTIFY=y CONFIG_INOTIFY_USER=y # CONFIG_QUOTA is not set # CONFIG_AUTOFS_FS is not set # CONFIG_AUTOFS4_FS is not set # CONFIG_FUSE_FS is not set CONFIG_GENERIC_ACL=y # # CD-ROM/DVD Filesystems # # CONFIG_ISO9660_FS is not set # CONFIG_UDF_FS is not set # # DOS/FAT/NT Filesystems # # CONFIG_MSDOS_FS is not set # CONFIG_VFAT_FS is not set # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_PROC_SYSCTL=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_CONFIGFS_FS=m # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_HFSPLUS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set CONFIG_NETWORK_FILESYSTEMS=y CONFIG_NFS_FS=y CONFIG_NFS_V3=y # CONFIG_NFS_V3_ACL is not set # CONFIG_NFS_V4 is not set CONFIG_NFSD=y CONFIG_NFSD_V3=y # CONFIG_NFSD_V3_ACL is not set # CONFIG_NFSD_V4 is not set CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y # CONFIG_SUNRPC_BIND34 is not set # CONFIG_RPCSEC_GSS_KRB5 is not set # CONFIG_RPCSEC_GSS_SPKM3 is not set # CONFIG_SMB_FS is not set # CONFIG_CIFS is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # # Partition Types # CONFIG_PARTITION_ADVANCED=y # CONFIG_ACORN_PARTITION is not set # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set CONFIG_IBM_PARTITION=y # CONFIG_MAC_PARTITION is not set CONFIG_MSDOS_PARTITION=y # CONFIG_BSD_DISKLABEL is not set # CONFIG_MINIX_SUBPARTITION is not set # CONFIG_SOLARIS_X86_PARTITION is not set # CONFIG_UNIXWARE_DISKLABEL is not set # CONFIG_LDM_PARTITION is not set # CONFIG_SGI_PARTITION is not set # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set # CONFIG_KARMA_PARTITION is not set # CONFIG_EFI_PARTITION is not set # CONFIG_SYSV68_PARTITION is not set # CONFIG_NLS is not set CONFIG_DLM=m # CONFIG_DLM_DEBUG is not set # # Kernel hacking # CONFIG_TRACE_IRQFLAGS_SUPPORT=y # CONFIG_PRINTK_TIME is not set CONFIG_ENABLE_WARN_DEPRECATED=y CONFIG_ENABLE_MUST_CHECK=y CONFIG_FRAME_WARN=2048 CONFIG_MAGIC_SYSRQ=y # CONFIG_UNUSED_SYMBOLS is not set CONFIG_DEBUG_FS=y # CONFIG_HEADERS_CHECK is not set CONFIG_DEBUG_KERNEL=y # CONFIG_SCHED_DEBUG is not set # CONFIG_SCHEDSTATS is not set # CONFIG_TIMER_STATS is not set # CONFIG_DEBUG_OBJECTS is not set # CONFIG_DEBUG_SLAB is not set CONFIG_DEBUG_PREEMPT=y # CONFIG_DEBUG_RT_MUTEXES is not set # CONFIG_RT_MUTEX_TESTER is not set CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y # CONFIG_DEBUG_LOCK_ALLOC is not set # CONFIG_PROVE_LOCKING is not set # CONFIG_LOCK_STAT is not set CONFIG_DEBUG_SPINLOCK_SLEEP=y # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set # CONFIG_DEBUG_KOBJECT is not set CONFIG_DEBUG_BUGVERBOSE=y # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_VM is not set # CONFIG_DEBUG_WRITECOUNT is not set # CONFIG_DEBUG_LIST is not set # CONFIG_DEBUG_SG is not set # CONFIG_FRAME_POINTER is not set # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_KPROBES_SANITY_TEST is not set # CONFIG_BACKTRACE_SELF_TEST is not set # CONFIG_LKDTM is not set # CONFIG_FAULT_INJECTION is not set # CONFIG_LATENCYTOP is not set CONFIG_SAMPLES=y # CONFIG_SAMPLE_KOBJECT is not set # CONFIG_SAMPLE_KPROBES is not set # CONFIG_DEBUG_PAGEALLOC is not set # # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set # CONFIG_SECURITY_FILE_CAPABILITIES is not set CONFIG_CRYPTO=y # # Crypto core or helper # CONFIG_CRYPTO_ALGAPI=y CONFIG_CRYPTO_AEAD=m CONFIG_CRYPTO_BLKCIPHER=y CONFIG_CRYPTO_HASH=m CONFIG_CRYPTO_MANAGER=y CONFIG_CRYPTO_GF128MUL=m # CONFIG_CRYPTO_NULL is not set # CONFIG_CRYPTO_CRYPTD is not set CONFIG_CRYPTO_AUTHENC=m # CONFIG_CRYPTO_TEST is not set # # Authenticated Encryption with Associated Data # CONFIG_CRYPTO_CCM=m CONFIG_CRYPTO_GCM=m CONFIG_CRYPTO_SEQIV=m # # Block modes # CONFIG_CRYPTO_CBC=y CONFIG_CRYPTO_CTR=m CONFIG_CRYPTO_CTS=m CONFIG_CRYPTO_ECB=m # CONFIG_CRYPTO_LRW is not set CONFIG_CRYPTO_PCBC=m # CONFIG_CRYPTO_XTS is not set # # Hash modes # CONFIG_CRYPTO_HMAC=m # CONFIG_CRYPTO_XCBC is not set # # Digest # # CONFIG_CRYPTO_CRC32C is not set # CONFIG_CRYPTO_MD4 is not set CONFIG_CRYPTO_MD5=m # CONFIG_CRYPTO_MICHAEL_MIC is not set CONFIG_CRYPTO_SHA1=m # CONFIG_CRYPTO_SHA256 is not set # CONFIG_CRYPTO_SHA512 is not set # CONFIG_CRYPTO_TGR192 is not set # CONFIG_CRYPTO_WP512 is not set # # Ciphers # # CONFIG_CRYPTO_AES is not set # CONFIG_CRYPTO_ANUBIS is not set # CONFIG_CRYPTO_ARC4 is not set # CONFIG_CRYPTO_BLOWFISH is not set CONFIG_CRYPTO_CAMELLIA=m # CONFIG_CRYPTO_CAST5 is not set # CONFIG_CRYPTO_CAST6 is not set # CONFIG_CRYPTO_DES is not set CONFIG_CRYPTO_FCRYPT=m # CONFIG_CRYPTO_KHAZAD is not set CONFIG_CRYPTO_SALSA20=m CONFIG_CRYPTO_SEED=m # CONFIG_CRYPTO_SERPENT is not set # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_TWOFISH is not set # # Compression # # CONFIG_CRYPTO_DEFLATE is not set CONFIG_CRYPTO_LZO=m CONFIG_CRYPTO_HW=y CONFIG_ZCRYPT=m # CONFIG_ZCRYPT_MONOLITHIC is not set # CONFIG_CRYPTO_SHA1_S390 is not set # CONFIG_CRYPTO_SHA256_S390 is not set CONFIG_CRYPTO_SHA512_S390=m # CONFIG_CRYPTO_DES_S390 is not set # CONFIG_CRYPTO_AES_S390 is not set CONFIG_S390_PRNG=m # # Library routines # CONFIG_BITREVERSE=m # CONFIG_GENERIC_FIND_FIRST_BIT is not set # CONFIG_GENERIC_FIND_NEXT_BIT is not set # CONFIG_CRC_CCITT is not set # CONFIG_CRC16 is not set # CONFIG_CRC_ITU_T is not set CONFIG_CRC32=m CONFIG_CRC7=m CONFIG_LIBCRC32C=m CONFIG_LZO_COMPRESS=m CONFIG_LZO_DECOMPRESS=m CONFIG_PLIST=y CONFIG_HAVE_KVM=y CONFIG_VIRTUALIZATION=y CONFIG_KVM=m CONFIG_VIRTIO=y CONFIG_VIRTIO_RING=y CONFIG_VIRTIO_BALLOON=m ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-19 16:19 [BUG] CFS vs cpu hotplug Heiko Carstens 2008-06-19 18:05 ` Peter Zijlstra @ 2008-06-25 22:12 ` Dmitry Adamushko 2008-06-28 22:16 ` Dmitry Adamushko 1 sibling, 1 reply; 28+ messages in thread From: Dmitry Adamushko @ 2008-06-25 22:12 UTC (permalink / raw) To: Heiko Carstens; +Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, linux-kernel 2008/6/19 Heiko Carstens <heiko.carstens@de.ibm.com>: > Hi Ingo, Peter, > > I'm still seeing kernel crashes on cpu hotplug with Linus' current git tree. > All I have to do is to make all cpus busy (make -j4 of the kernel source is > sufficient) and then start cpu hotplug stress. > It usually takes below a minute to crash the system like this: > > Unable to handle kernel pointer dereference at virtual kernel address 005a800000031000 > Oops: 0038 [#1] PREEMPT SMP > Modules linked in: > CPU: 1 Not tainted 2.6.26-rc6-00232-g9bedbcb #356 > Process swapper (pid: 0, task: 000000002fe7ccf8, ksp: 000000002fe93d78) > Krnl PSW : 0400e00180000000 0000000000032c6c (pick_next_task_fair+0x34/0xb0) > R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:2 PM:0 EA:3 > Krnl GPRS: 00000000001ff000 0000000000030bd8 000000000075a380 000000002fe7ccf8 > 0000000000386690 0000000000000008 0000000000000000 000000002fe7cf58 > 0000000000000001 000000000075a300 0000000000000000 000000002fe93d40 > 005a800000031201 0000000000386010 000000002fe93d78 000000002fe93d40 > Krnl Code: 0000000000032c5c: e3e0f0980024 stg %r14,152(%r15) > 0000000000032c62: d507d000c010 clc 0(8,%r13),16(%r12) > 0000000000032c68: a784003c brc 8,32ce0 > >0000000000032c6c: d507d000c030 clc 0(8,%r13),48(%r12) > 0000000000032c72: b904002c lgr %r2,%r12 > 0000000000032c76: a7a90000 lghi %r10,0 > 0000000000032c7a: a7840021 brc 8,32cbc > 0000000000032c7e: c0e5ffffefe3 brasl %r14,30c44 > Call Trace: > ([<000000000075a300>] 0x75a300) > [<000000000037195a>] schedule+0x162/0x7f4 > [<000000000001a2be>] cpu_idle+0x1ca/0x25c > [<000000000036f368>] start_secondary+0xac/0xb8 > [<0000000000000000>] 0x0 > [<0000000000000000>] 0x0 > Last Breaking-Event-Address: > [<0000000000032cc6>] pick_next_task_fair+0x8e/0xb0 > <4>---[ end trace 9bb55df196feedcc ]--- > Kernel panic - not syncing: Attempted to kill the idle task! > > Please note that the above call trace is from s390, however Avi reported the > same bug on x86_64. FYI, I've managed to reproduce it 3 times (took 10 to 45 minutes) on my dual-core Thinkpad R60. (1) make -j3 of the kernel source (2) a loop with : offline cpu_1 ; sleep 1 ; online cpu_1 ; sleep 1 2 times in the GUI environment so I couldn't see an oops (although, I could here it as the very first time my laptop was constantly beeeeeeeep-ing :-) Strangely enough, an oops didn't appear in the plain console mode (well, at least not on the active terminal). Although, my additional debugging message from pick_next_task_fair() did appear on the screen right before the system froze.. It's in the loop of pick_next_task_fair(): do { se = pick_next_entity(cfs_rq); if (unlikely(!se)) printk(KERN_ERR "BUG: se == NULL but nr_running (%ld), load (%ld), " rq-nr_running (%ld), rq-load (%ld)\n", cfs_rq->nr_running, cfs_rq->load.weight, rq->nr_running, r cfs_rq = group_cfs_rq(se); } while (cfs_rq); BUG: se == NULL but nr_running (1), load (1024), rq-nr_running (1), rq-load (1024) so there is a crouching gremlin somewhere in the code :-/ -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-25 22:12 ` Dmitry Adamushko @ 2008-06-28 22:16 ` Dmitry Adamushko 2008-06-29 6:55 ` Ingo Molnar 2008-06-30 9:07 ` Heiko Carstens 0 siblings, 2 replies; 28+ messages in thread From: Dmitry Adamushko @ 2008-06-28 22:16 UTC (permalink / raw) To: Heiko Carstens; +Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1335 bytes --] Hello, it seems to be related to migrate_dead_tasks(). Firstly I added traces to see all tasks being migrated with migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem pops up (the one with "se == NULL" in the loop of pick_next_task_fair()) shortly after the traces indicate that some has been migrated with migrate_dead_tasks()). btw., I can reproduce it much faster now with just a plain cpu down/up loop. [disclaimer] Well, unless I'm really missing something important in this late hour [/desclaimer] pick_next_task() is not something appropriate for migrate_dead_tasks() :-) the following change seems to eliminate the problem on my setup (although, I kept it running only for a few minutes to get a few messages indicating migrate_dead_tasks() does move tasks and the system is still ok) [ quick hack ] @@ -5887,6 +5907,7 @@ static void migrate_dead_tasks(unsigned int dead_cpu) next = pick_next_task(rq, rq->curr); if (!next) break; + next->sched_class->put_prev_task(rq, next); migrate_dead(dead_cpu, next); } just in case, all the changes I've used for this test are attached "as is". p.s. perhaps I won't be able to verify it carefully till tomorrow's late evening. -- Best regards, Dmitry Adamushko [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: migration-experiment.patch --] [-- Type: text/x-diff; name=migration-experiment.patch, Size: 4206 bytes --] diff --git a/kernel/cpu.c b/kernel/cpu.c index c77bc3a..db92c01 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -15,6 +15,8 @@ #include <linux/stop_machine.h> #include <linux/mutex.h> +extern int sched_check_offline_cpu(int cpu); + /* Serializes the updates to cpu_online_map, cpu_present_map */ static DEFINE_MUTEX(cpu_add_remove_lock); @@ -247,7 +249,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) /* This actually kills the CPU. */ __cpu_die(cpu); - + /* CPU is completely dead: tell everyone. Too late to complain. */ if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD | mod, hcpu) == NOTIFY_BAD) @@ -255,6 +257,8 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) check_for_tasks(cpu); + sched_check_offline_cpu(cpu); + out_thread: err = kthread_stop(p); out_allowed: @@ -289,6 +293,11 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen) if (cpu_online(cpu) || !cpu_present(cpu)) return -EINVAL; + printk("cpu_up:\n"); + ret = sched_check_offline_cpu(cpu); + if (ret) + return -EINVAL; + cpu_hotplug_begin(); ret = __raw_notifier_call_chain(&cpu_chain, CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls); diff --git a/kernel/sched.c b/kernel/sched.c index 3aaa5c8..f20fe1c 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4135,6 +4135,22 @@ pick_next_task(struct rq *rq, struct task_struct *prev) } } +int sched_check_offline_cpu(int cpu) +{ + struct task_group *tgi; + int ret; + + ret = check_cfs_tree(&cpu_rq(cpu)->cfs); + + rcu_read_lock(); + list_for_each_entry_rcu(tgi, &task_groups, list) { + ret += check_cfs_tree(tgi->cfs_rq[cpu]); + } + rcu_read_unlock(); + + return ret; +} + /* * schedule() is the main scheduler function. */ @@ -5712,6 +5728,9 @@ static int __migrate_task_irq(struct task_struct *p, int src_cpu, int dest_cpu) local_irq_disable(); ret = __migrate_task(p, src_cpu, dest_cpu); local_irq_enable(); + + printk(KERN_ERR "__migrate(%d -- %s) -> cpu (%d) == ret (%d)\n", + p->pid, p->comm, dest_cpu, ret); return ret; } @@ -5868,6 +5887,7 @@ static void migrate_dead(unsigned int dead_cpu, struct task_struct *p) * fine. */ spin_unlock_irq(&rq->lock); + printk(KERN_ERR "---> migrate_dead(%d -- %s)\n", p->pid, p->comm); move_task_off_dead_cpu(dead_cpu, p); spin_lock_irq(&rq->lock); @@ -5887,6 +5907,7 @@ static void migrate_dead_tasks(unsigned int dead_cpu) next = pick_next_task(rq, rq->curr); if (!next) break; + next->sched_class->put_prev_task(rq, next); migrate_dead(dead_cpu, next); } diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index be16dfc..6d96890 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1235,10 +1235,13 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); - if (unlikely(!se)) + if (unlikely(!se)) { printk(KERN_ERR "BUG: se == NULL but nr_running (%ld), load (%ld)," " rq-nr_running (%ld), rq-load (%ld)\n", cfs_rq->nr_running, cfs_rq->load.weight, rq->nr_running, rq->load.weight); +// BUG(); + return NULL; + } cfs_rq = group_cfs_rq(se); } while (cfs_rq); @@ -1315,6 +1318,41 @@ static struct task_struct *load_balance_next_fair(void *arg) return __load_balance_iterator(cfs_rq, cfs_rq->balance_iterator); } +static int check_cfs_tree(struct cfs_rq *cfs_rq) +{ + struct list_head *next = cfs_rq->tasks.next; + struct sched_entity *se; + struct task_struct *p; + int ret = 0; + + if (next == &cfs_rq->tasks) + return ret; + + do { + se = list_entry(next, struct sched_entity, group_node); + + if (entity_is_task(se)) { + p = task_of(se); + + printk("* ERROR: (task) %d - %s\n", p->pid, p->comm); + ret = 1; + } else { + struct cfs_rq *cfs_rq_child = group_cfs_rq(se); + + if (cfs_rq_child->nr_running || cfs_rq_child->load.weight) { + printk("* ERROR: (group) %ld - %ld\n", + cfs_rq_child->nr_running, cfs_rq_child->load.weight); + check_cfs_tree(cfs_rq_child); + ret = 1; + } + } + + next = next->next; + } while (next != &cfs_rq->tasks); + + return ret; +} + #ifdef CONFIG_FAIR_GROUP_SCHED static int cfs_rq_best_prio(struct cfs_rq *cfs_rq) { ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-28 22:16 ` Dmitry Adamushko @ 2008-06-29 6:55 ` Ingo Molnar 2008-06-30 9:07 ` Heiko Carstens 1 sibling, 0 replies; 28+ messages in thread From: Ingo Molnar @ 2008-06-29 6:55 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: Heiko Carstens, Peter Zijlstra, Avi Kivity, linux-kernel * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote: > Hello, > > it seems to be related to migrate_dead_tasks(). > > Firstly I added traces to see all tasks being migrated with > migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem > pops up (the one with "se == NULL" in the loop of > pick_next_task_fair()) shortly after the traces indicate that some has > been migrated with migrate_dead_tasks()). btw., I can reproduce it > much faster now with just a plain cpu down/up loop. > > [disclaimer] Well, unless I'm really missing something important in > this late hour [/desclaimer] pick_next_task() is not something > appropriate for migrate_dead_tasks() :-) > > the following change seems to eliminate the problem on my setup > (although, I kept it running only for a few minutes to get a few > messages indicating migrate_dead_tasks() does move tasks and the > system is still ok) > > [ quick hack ] > > @@ -5887,6 +5907,7 @@ static void migrate_dead_tasks(unsigned int dead_cpu) > next = pick_next_task(rq, rq->curr); > if (!next) > break; > + next->sched_class->put_prev_task(rq, next); > migrate_dead(dead_cpu, next); > thanks Dmitry - i've applied this chunk to tip/master and tip/sched/urgent, for more testing. if this turns out to be the final and full fix today, would you mind to submit the rest of your checks as well? It seems like a rather sensible set of sanity checks. Put under CONFIG_SCHED_DEBUG or a new (default-off) config option. it would also be _very_ nice to have a built-in cpu hotplug tester in the kernel, a'ka CONFIG_RCU_TORTURE_TEST=y. There's already sample code in kernel/tracing/ of how to initiate hotplug events from within the kernel. Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-28 22:16 ` Dmitry Adamushko 2008-06-29 6:55 ` Ingo Molnar @ 2008-06-30 9:07 ` Heiko Carstens 2008-06-30 9:17 ` Ingo Molnar 1 sibling, 1 reply; 28+ messages in thread From: Heiko Carstens @ 2008-06-30 9:07 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, linux-kernel On Sun, Jun 29, 2008 at 12:16:56AM +0200, Dmitry Adamushko wrote: > Hello, > > > it seems to be related to migrate_dead_tasks(). > > Firstly I added traces to see all tasks being migrated with > migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem > pops up (the one with "se == NULL" in the loop of > pick_next_task_fair()) shortly after the traces indicate that some has > been migrated with migrate_dead_tasks()). btw., I can reproduce it > much faster now with just a plain cpu down/up loop. > > [disclaimer] Well, unless I'm really missing something important in > this late hour [/desclaimer] pick_next_task() is not something > appropriate for migrate_dead_tasks() :-) > > the following change seems to eliminate the problem on my setup > (although, I kept it running only for a few minutes to get a few > messages indicating migrate_dead_tasks() does move tasks and the > system is still ok) > > [ quick hack ] > > @@ -5887,6 +5907,7 @@ static void migrate_dead_tasks(unsigned int dead_cpu) > next = pick_next_task(rq, rq->curr); > if (!next) > break; > + next->sched_class->put_prev_task(rq, next); > migrate_dead(dead_cpu, next); > > } Thanks Dmitry! With your patch I cannot reproduce the bug anymore. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-30 9:07 ` Heiko Carstens @ 2008-06-30 9:17 ` Ingo Molnar 2008-07-01 9:22 ` Lai Jiangshan 0 siblings, 1 reply; 28+ messages in thread From: Ingo Molnar @ 2008-06-30 9:17 UTC (permalink / raw) To: Heiko Carstens Cc: Dmitry Adamushko, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Sun, Jun 29, 2008 at 12:16:56AM +0200, Dmitry Adamushko wrote: > > Hello, > > > > > > it seems to be related to migrate_dead_tasks(). > > > > Firstly I added traces to see all tasks being migrated with > > migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem > > pops up (the one with "se == NULL" in the loop of > > pick_next_task_fair()) shortly after the traces indicate that some has > > been migrated with migrate_dead_tasks()). btw., I can reproduce it > > much faster now with just a plain cpu down/up loop. > > > > [disclaimer] Well, unless I'm really missing something important in > > this late hour [/desclaimer] pick_next_task() is not something > > appropriate for migrate_dead_tasks() :-) > > > > the following change seems to eliminate the problem on my setup > > (although, I kept it running only for a few minutes to get a few > > messages indicating migrate_dead_tasks() does move tasks and the > > system is still ok) > > > > [ quick hack ] > > > > @@ -5887,6 +5907,7 @@ static void migrate_dead_tasks(unsigned int dead_cpu) > > next = pick_next_task(rq, rq->curr); > > if (!next) > > break; > > + next->sched_class->put_prev_task(rq, next); > > migrate_dead(dead_cpu, next); > > > > } > > Thanks Dmitry! With your patch I cannot reproduce the bug anymore. thanks - it passed my testing too. It's lined up for v2.6.26 merge, in tip/sched/urgent. Avi, does this patch fix your CPU hotplug problems too? Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-06-30 9:17 ` Ingo Molnar @ 2008-07-01 9:22 ` Lai Jiangshan 2008-07-01 9:31 ` Ingo Molnar 0 siblings, 1 reply; 28+ messages in thread From: Lai Jiangshan @ 2008-07-01 9:22 UTC (permalink / raw) To: Ingo Molnar Cc: Heiko Carstens, Dmitry Adamushko, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton Ingo Molnar wrote: > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > >> On Sun, Jun 29, 2008 at 12:16:56AM +0200, Dmitry Adamushko wrote: >>> Hello, >>> >>> >>> it seems to be related to migrate_dead_tasks(). >>> >>> Firstly I added traces to see all tasks being migrated with >>> migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem >>> pops up (the one with "se == NULL" in the loop of >>> pick_next_task_fair()) shortly after the traces indicate that some has >>> been migrated with migrate_dead_tasks()). btw., I can reproduce it >>> much faster now with just a plain cpu down/up loop. >>> >>> [disclaimer] Well, unless I'm really missing something important in >>> this late hour [/desclaimer] pick_next_task() is not something >>> appropriate for migrate_dead_tasks() :-) >>> >>> the following change seems to eliminate the problem on my setup >>> (although, I kept it running only for a few minutes to get a few >>> messages indicating migrate_dead_tasks() does move tasks and the >>> system is still ok) >>> >>> [ quick hack ] >>> >>> @@ -5887,6 +5907,7 @@ static void migrate_dead_tasks(unsigned int dead_cpu) >>> next = pick_next_task(rq, rq->curr); >>> if (!next) >>> break; >>> + next->sched_class->put_prev_task(rq, next); >>> migrate_dead(dead_cpu, next); >>> >>> } >> Thanks Dmitry! With your patch I cannot reproduce the bug anymore. > > thanks - it passed my testing too. It's lined up for v2.6.26 merge, in > tip/sched/urgent. > > Avi, does this patch fix your CPU hotplug problems too? > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > > Hi, Ingo The following oops still occurred whether this patch is applied or not. Lai Jiangshan ------------[ cut here ]------------ kernel BUG at kernel/sched.c:6133! invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: Pid: 4744, comm: cpu_online.sh Not tainted 2.6.26-rc8 #1 RIP: 0010:[<ffffffff8058d0a9>] [<ffffffff8058d0a9>] migration_call+0x3eb/0x494 RSP: 0018:ffff81007115fd28 EFLAGS: 00010202 RAX: ffffffffffffffe3 RBX: ffff810001017580 RCX: 000000801b7c6e42 RDX: ffff81007115fcf8 RSI: 0000009388d2771c RDI: ffff810001017e00 RBP: ffff81007115fd78 R08: ffff81007115e000 R09: ffff8100807d6000 R10: ffff81007fb6d050 R11: 00000000ffffffff R12: 0000000000000283 R13: ffff810001029580 R14: ffff810001029580 R15: 0000000000000002 FS: 00007fbb153d36f0(0000) GS:ffffffff807a3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fabafe2b0a8 CR3: 0000000076901000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process cpu_online.sh (pid: 4744, threadinfo ffff81007115e000, task ffff810071447200) Stack: ffff81007115e000 000000007115fbd8 00000000ffffffff 0000000000000002 ffff81007115fd78 0000000000000000 00000000ffffffff ffffffff807a1d40 0000000000000002 0000000000000007 ffff81007115fdb8 ffffffff8059372c Call Trace: [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11 [<ffffffff805736d6>] _cpu_down+0x191/0x256 [<ffffffff805737c1>] cpu_down+0x26/0x36 [<ffffffff805749c1>] store_online+0x32/0x75 [<ffffffff803d1982>] sysdev_store+0x24/0x26 [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c [<ffffffff80290e6b>] vfs_write+0xae/0x137 [<ffffffff802913d3>] sys_write+0x47/0x70 [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80 Code: 80 07 00 00 48 01 83 80 07 00 00 49 c7 85 80 07 00 00 00 00 00 00 41 fe 45 00 49 39 dd 74 02 fe 03 41 54 9d 49 83 7d 08 00 74 04 <0f> 0b eb fe 4c 89 ef e8 b8 40 00 00 eb 1e 48 8b 11 48 8b 41 08 RIP [<ffffffff8058d0a9>] migration_call+0x3eb/0x494 RSP <ffff81007115fd28> ---[ end trace f22fd757d4f07850 ]--- platform: x86_64 2cores*2cpus fedora9 # cat cpu_online.sh #!/bin/sh cpu1=1 cpu2=1 cpu3=1 while ((1)) do no=$(($RANDOM % 3 + 1)) if ((!cpu$no)) then echo 1 > /sys/devices/system/cpu/cpu$no/online ((cpu$no=1)) else echo 0 > /sys/devices/system/cpu/cpu$no/online ((cpu$no=0)) fi echo 1 $cpu1 $cpu2 $cpu3 sleep 2 done ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-01 9:22 ` Lai Jiangshan @ 2008-07-01 9:31 ` Ingo Molnar 2008-07-01 10:09 ` Lai Jiangshan 2008-07-02 7:13 ` Lai Jiangshan 0 siblings, 2 replies; 28+ messages in thread From: Ingo Molnar @ 2008-07-01 9:31 UTC (permalink / raw) To: Lai Jiangshan Cc: Heiko Carstens, Dmitry Adamushko, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton * Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > The following oops still occurred whether this patch is applied or not. > [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b > [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb > [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11 > [<ffffffff805736d6>] _cpu_down+0x191/0x256 > [<ffffffff805737c1>] cpu_down+0x26/0x36 > [<ffffffff805749c1>] store_online+0x32/0x75 > [<ffffffff803d1982>] sysdev_store+0x24/0x26 > [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c > [<ffffffff80290e6b>] vfs_write+0xae/0x137 > [<ffffffff802913d3>] sys_write+0x47/0x70 > [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80 hm, there were multiple problems in this area and a lot of dormant bugs. Do you have this recent upstream commit in your tree: | commit fcb43042ef55d2f46b0efa5d7746967cef38f056 | Author: Zhang, Yanmin <yanmin_zhang@linux.intel.com> | Date: Tue Jun 24 16:06:23 2008 +0800 | | x86: fix cpu hotplug crash | | Vegard Nossum reported crashes during cpu hotplug tests: | | http://marc.info/?l=linux-kernel&m=121413950227884&w=4 ? Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-01 9:31 ` Ingo Molnar @ 2008-07-01 10:09 ` Lai Jiangshan 2008-07-02 7:13 ` Lai Jiangshan 1 sibling, 0 replies; 28+ messages in thread From: Lai Jiangshan @ 2008-07-01 10:09 UTC (permalink / raw) To: Ingo Molnar Cc: Heiko Carstens, Dmitry Adamushko, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton Ingo Molnar wrote: > * Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > >> The following oops still occurred whether this patch is applied or not. > >> [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b >> [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb >> [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11 >> [<ffffffff805736d6>] _cpu_down+0x191/0x256 >> [<ffffffff805737c1>] cpu_down+0x26/0x36 >> [<ffffffff805749c1>] store_online+0x32/0x75 >> [<ffffffff803d1982>] sysdev_store+0x24/0x26 >> [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c >> [<ffffffff80290e6b>] vfs_write+0xae/0x137 >> [<ffffffff802913d3>] sys_write+0x47/0x70 >> [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80 > > hm, there were multiple problems in this area and a lot of dormant bugs. > Do you have this recent upstream commit in your tree: No, I'll apply this patch and test it again. Thanks! > > | commit fcb43042ef55d2f46b0efa5d7746967cef38f056 > | Author: Zhang, Yanmin <yanmin_zhang@linux.intel.com> > | Date: Tue Jun 24 16:06:23 2008 +0800 > | > | x86: fix cpu hotplug crash > | > | Vegard Nossum reported crashes during cpu hotplug tests: > | > | http://marc.info/?l=linux-kernel&m=121413950227884&w=4 > > ? > > Ingo > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-01 9:31 ` Ingo Molnar 2008-07-01 10:09 ` Lai Jiangshan @ 2008-07-02 7:13 ` Lai Jiangshan 2008-07-02 8:50 ` Dmitry Adamushko 1 sibling, 1 reply; 28+ messages in thread From: Lai Jiangshan @ 2008-07-02 7:13 UTC (permalink / raw) To: Ingo Molnar Cc: Heiko Carstens, Dmitry Adamushko, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton Ingo Molnar wrote: > * Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > >> The following oops still occurred whether this patch is applied or not. > >> [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b >> [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb >> [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11 >> [<ffffffff805736d6>] _cpu_down+0x191/0x256 >> [<ffffffff805737c1>] cpu_down+0x26/0x36 >> [<ffffffff805749c1>] store_online+0x32/0x75 >> [<ffffffff803d1982>] sysdev_store+0x24/0x26 >> [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c >> [<ffffffff80290e6b>] vfs_write+0xae/0x137 >> [<ffffffff802913d3>] sys_write+0x47/0x70 >> [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80 > > hm, there were multiple problems in this area and a lot of dormant bugs. > Do you have this recent upstream commit in your tree: Hi, Ingo I tested it again with the most recent upstreams(including the following patch) committed, the oops still occurred. Thanks, Lai Jiangshan > > | commit fcb43042ef55d2f46b0efa5d7746967cef38f056 > | Author: Zhang, Yanmin <yanmin_zhang@linux.intel.com> > | Date: Tue Jun 24 16:06:23 2008 +0800 > | > | x86: fix cpu hotplug crash > | > | Vegard Nossum reported crashes during cpu hotplug tests: > | > | http://marc.info/?l=linux-kernel&m=121413950227884&w=4 > > ? > > Ingo > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-02 7:13 ` Lai Jiangshan @ 2008-07-02 8:50 ` Dmitry Adamushko 2008-07-02 9:23 ` Lai Jiangshan 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Adamushko @ 2008-07-02 8:50 UTC (permalink / raw) To: Lai Jiangshan Cc: Ingo Molnar, Heiko Carstens, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton 2008/7/2 Lai Jiangshan <laijs@cn.fujitsu.com>: > Ingo Molnar wrote: >> * Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >> >>> The following oops still occurred whether this patch is applied or not. >> >>> [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b >>> [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb >>> [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11 >>> [<ffffffff805736d6>] _cpu_down+0x191/0x256 >>> [<ffffffff805737c1>] cpu_down+0x26/0x36 >>> [<ffffffff805749c1>] store_online+0x32/0x75 >>> [<ffffffff803d1982>] sysdev_store+0x24/0x26 >>> [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c >>> [<ffffffff80290e6b>] vfs_write+0xae/0x137 >>> [<ffffffff802913d3>] sys_write+0x47/0x70 >>> [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80 >> >> hm, there were multiple problems in this area and a lot of dormant bugs. >> Do you have this recent upstream commit in your tree: > Hi, Ingo > I tested it again with the most recent upstreams(including the > following patch) committed, the oops still occurred. [ taken from the oops ] > > kernel BUG at kernel/sched.c:6133! > is it BUG_ON(rq->nr_running != 0); in your sched.c? hum, it's line #6134 in the recent sched.c version. So with the recent version it was "kernel BUG at kernel/sched.c:6134!" right? could you please try to get a crash with my additional debugging patch (you may find it in this thread) applied? We should see then all tasks that have been migrated (or failed to be migrated) during migration_call(CPU_DEAD, ...). TIA, -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-02 8:50 ` Dmitry Adamushko @ 2008-07-02 9:23 ` Lai Jiangshan 2008-07-07 10:26 ` Miao Xie 0 siblings, 1 reply; 28+ messages in thread From: Lai Jiangshan @ 2008-07-02 9:23 UTC (permalink / raw) To: Dmitry Adamushko Cc: Ingo Molnar, Heiko Carstens, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton Dmitry Adamushko wrote: > 2008/7/2 Lai Jiangshan <laijs@cn.fujitsu.com>: >> Ingo Molnar wrote: >>> * Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >>> >>>> The following oops still occurred whether this patch is applied or not. >>>> [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b >>>> [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb >>>> [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11 >>>> [<ffffffff805736d6>] _cpu_down+0x191/0x256 >>>> [<ffffffff805737c1>] cpu_down+0x26/0x36 >>>> [<ffffffff805749c1>] store_online+0x32/0x75 >>>> [<ffffffff803d1982>] sysdev_store+0x24/0x26 >>>> [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c >>>> [<ffffffff80290e6b>] vfs_write+0xae/0x137 >>>> [<ffffffff802913d3>] sys_write+0x47/0x70 >>>> [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80 >>> hm, there were multiple problems in this area and a lot of dormant bugs. >>> Do you have this recent upstream commit in your tree: >> Hi, Ingo >> I tested it again with the most recent upstreams(including the >> following patch) committed, the oops still occurred. > > [ taken from the oops ] >> kernel BUG at kernel/sched.c:6133! >> > > is it BUG_ON(rq->nr_running != 0); in your sched.c? yes, I had test it twice yesterday, applied/not applied your patch(no debugging). > > hum, it's line #6134 in the recent sched.c version. So with the recent > version it was "kernel BUG at kernel/sched.c:6134!" right? yes, and applied your's and Zhang's patch as Ingo's advice. > > could you please try to get a crash with my additional debugging patch > (you may find it in this thread) applied? > We should see then all tasks that have been migrated (or failed to be > migrated) during migration_call(CPU_DEAD, ...). > Thank you. I'll test it again with your debugging patch applied and get more info. > TIA, > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-02 9:23 ` Lai Jiangshan @ 2008-07-07 10:26 ` Miao Xie 2008-07-07 11:31 ` Dmitry Adamushko 0 siblings, 1 reply; 28+ messages in thread From: Miao Xie @ 2008-07-07 10:26 UTC (permalink / raw) To: Lai Jiangshan Cc: Dmitry Adamushko, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton on 3:59 Lai Jiangshan wrote: > Dmitry Adamushko wrote: >> 2008/7/2 Lai Jiangshan <laijs@cn.fujitsu.com>: >>> Ingo Molnar wrote: >>>> * Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >>>> >>>>> The following oops still occurred whether this patch is applied or not. >>>>> [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b >>>>> [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb >>>>> [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11 >>>>> [<ffffffff805736d6>] _cpu_down+0x191/0x256 >>>>> [<ffffffff805737c1>] cpu_down+0x26/0x36 >>>>> [<ffffffff805749c1>] store_online+0x32/0x75 >>>>> [<ffffffff803d1982>] sysdev_store+0x24/0x26 >>>>> [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c >>>>> [<ffffffff80290e6b>] vfs_write+0xae/0x137 >>>>> [<ffffffff802913d3>] sys_write+0x47/0x70 >>>>> [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80 >>>> hm, there were multiple problems in this area and a lot of dormant bugs. >>>> Do you have this recent upstream commit in your tree: >>> Hi, Ingo >>> I tested it again with the most recent upstreams(including the >>> following patch) committed, the oops still occurred. >> [ taken from the oops ] >>> kernel BUG at kernel/sched.c:6133! >>> [snip] >> We should see then all tasks that have been migrated (or failed to be >> migrated) during migration_call(CPU_DEAD, ...). >> > Thank you. I'll test it again with your debugging patch applied > and get more info. I tested it with Dmitry's patch, and found that all the tasks on the offline cpu were migrated to an online cpu by migrate_live_tasks() in migration_call(). But some tasks(such as klogd and so on)was moved back to the offline cpu immediately before BUG_ON(rq->nr_running != 0) checking, even before acquiring rq's lock. static int __cpuinit migration_call(struct notifier_block *nfb, unsigned long action, void * { ... switch (action) { ... case CPU_DEAD: case CPU_DEAD_FROZEN: cpuset_lock(); migrate_live_tasks(cpu); rq = cpu_rq(cpu); ... spin_lock_irq(&rq->lock); ... migrate_dead_tasks(cpu); spin_unlock_irq(&rq->lock); cpuset_unlock(); migrate_nr_uninterruptible(rq); BUG_ON(rq->nr_running != 0); ... break; } ... } By debuging, I found this bug was caused by select_task_rq_fair(). After migrating the tasks on the offline cpu to an online cpu, the kernel would wake up these migrated tasks quickly by try_to_wake_up(). try_to_wake_up() would invoke select_task_rq_fair() to find a lower-load cpu in sched domains for them. But the sched domains weren't updated and the offline cpu was still in the sched domains. So select_task_rq_fair() might return the offline cpu's id, then the bug occurred. I fix the bug just by checking the select_task_rq_fair()'s return value in try_to_wake_up(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- kernel/sched.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 94ead43..15b5ddf 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2103,6 +2103,9 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync) goto out_activate; cpu = p->sched_class->select_task_rq(p, sync); + if (unlikely(cpu_is_offline(cpu))) + cpu = orig_cpu; + if (cpu != orig_cpu) { set_task_cpu(p, cpu); task_rq_unlock(rq, &flags); -- 1.5.4.rc3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-07 10:26 ` Miao Xie @ 2008-07-07 11:31 ` Dmitry Adamushko 0 siblings, 0 replies; 28+ messages in thread From: Dmitry Adamushko @ 2008-07-07 11:31 UTC (permalink / raw) To: miaox Cc: Lai Jiangshan, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton 2008/7/7 Miao Xie <miaox@cn.fujitsu.com>: > on 3:59 Lai Jiangshan wrote: >> Dmitry Adamushko wrote: >>> >>> [ ... ] >>> >>> We should see then all tasks that have been migrated (or failed to be >>> migrated) during migration_call(CPU_DEAD, ...). >>> >> Thank you. I'll test it again with your debugging patch applied >> and get more info. > > I tested it with Dmitry's patch, and found that all the tasks on the offline > cpu were migrated to an online cpu by migrate_live_tasks() in migration_call(). > But some tasks(such as klogd and so on)was moved back to the offline cpu > immediately before BUG_ON(rq->nr_running != 0) checking, even before acquiring > rq's lock. > > static int __cpuinit > migration_call(struct notifier_block *nfb, unsigned long action, void * > { > ... > switch (action) { > ... > case CPU_DEAD: > case CPU_DEAD_FROZEN: > cpuset_lock(); > migrate_live_tasks(cpu); > rq = cpu_rq(cpu); > ... > spin_lock_irq(&rq->lock); > ... > migrate_dead_tasks(cpu); > spin_unlock_irq(&rq->lock); > cpuset_unlock(); > migrate_nr_uninterruptible(rq); > BUG_ON(rq->nr_running != 0); > ... > break; > } > ... > } > > By debuging, I found this bug was caused by select_task_rq_fair(). Thanks for tracking this down! > After migrating the tasks on the offline cpu to an online cpu, the kernel would > wake up these migrated tasks quickly by try_to_wake_up(). try_to_wake_up() would > invoke select_task_rq_fair() to find a lower-load cpu in sched domains for them. > But the sched domains weren't updated and the offline cpu was still in the sched > domains. Hmm... if so, then this should be fixed, not select_task_rq_fair(). I don't think this is expected behavior. > So select_task_rq_fair() might return the offline cpu's id, then the > bug occurred. > > I fix the bug just by checking the select_task_rq_fair()'s return value in > try_to_wake_up(). > > [ ... ] -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug @ 2008-07-09 22:32 Dmitry Adamushko 2008-07-10 7:30 ` Heiko Carstens 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Adamushko @ 2008-07-09 22:32 UTC (permalink / raw) To: Ingo Molnar Cc: miaox, Lai Jiangshan, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton hm, while looking at this code again... Ingo, I think we may have a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu() when the later one may end up looping endlessly. Subject: sched: prevent a potentially endless loop in move_task_off_dead_cpu() Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is called so we may get a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push a task out of a dead CPU causing the later one to loop endlessly. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> --- diff --git a/kernel/sched.c b/kernel/sched.c index 94ead43..9397b87 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -5621,8 +5621,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu) double_rq_lock(rq_src, rq_dest); /* Already moved. */ - if (task_cpu(p) != src_cpu) + if (task_cpu(p) != src_cpu) { + ret = 1; goto out; + } /* Affinity changed (again). */ if (!cpu_isset(dest_cpu, p->cpus_allowed)) goto out; --- ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-09 22:32 Dmitry Adamushko @ 2008-07-10 7:30 ` Heiko Carstens 2008-07-10 7:39 ` Ingo Molnar 0 siblings, 1 reply; 28+ messages in thread From: Heiko Carstens @ 2008-07-10 7:30 UTC (permalink / raw) To: Dmitry Adamushko Cc: Ingo Molnar, miaox, Lai Jiangshan, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton On Thu, Jul 10, 2008 at 12:32:40AM +0200, Dmitry Adamushko wrote: > > hm, while looking at this code again... > > > Ingo, > > I think we may have a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu() > when the later one may end up looping endlessly. > > > Subject: sched: prevent a potentially endless loop in move_task_off_dead_cpu() > > Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is called so we may get a race > between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push > a task out of a dead CPU causing the later one to loop endlessly. That's exactly what explains a dump I got yesterday. Thanks for fixing! :) Will apply your patch and let you know if it fixes the problem. (may take until next week unfortunately). > Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> > > --- > diff --git a/kernel/sched.c b/kernel/sched.c > index 94ead43..9397b87 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -5621,8 +5621,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu) > > double_rq_lock(rq_src, rq_dest); > /* Already moved. */ > - if (task_cpu(p) != src_cpu) > + if (task_cpu(p) != src_cpu) { > + ret = 1; > goto out; > + } > /* Affinity changed (again). */ > if (!cpu_isset(dest_cpu, p->cpus_allowed)) > goto out; > > --- > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [BUG] CFS vs cpu hotplug 2008-07-10 7:30 ` Heiko Carstens @ 2008-07-10 7:39 ` Ingo Molnar 0 siblings, 0 replies; 28+ messages in thread From: Ingo Molnar @ 2008-07-10 7:39 UTC (permalink / raw) To: Heiko Carstens Cc: Dmitry Adamushko, miaox, Lai Jiangshan, Peter Zijlstra, Avi Kivity, linux-kernel, Andrew Morton * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > Subject: sched: prevent a potentially endless loop in > > move_task_off_dead_cpu() > > > > Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, > > ...) is called so we may get a race between try_to_wake_up() and > > migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may > > push a task out of a dead CPU causing the later one to loop > > endlessly. > > That's exactly what explains a dump I got yesterday. Thanks for > fixing! :) applied to tip/sched/urgent via the commit below - lets see whether we can still get it into v2.6.26. Ingo ----------------> commit dc7fab8b3bb388c57c6c4a43ba68c8a32ca25204 Author: Dmitry Adamushko <dmitry.adamushko@gmail.com> Date: Thu Jul 10 00:32:40 2008 +0200 sched: fix cpu hotplug I think we may have a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu() when the later one may end up looping endlessly. Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is called so we may get a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push a task out of a dead CPU causing the later one to loop endlessly. Heiko Carstens observed: | That's exactly what explains a dump I got yesterday. Thanks for fixing! :) Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: miaox@cn.fujitsu.com Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Avi Kivity <avi@qumranet.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/kernel/sched.c b/kernel/sched.c index 94ead43..9397b87 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -5621,8 +5621,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu) double_rq_lock(rq_src, rq_dest); /* Already moved. */ - if (task_cpu(p) != src_cpu) + if (task_cpu(p) != src_cpu) { + ret = 1; goto out; + } /* Affinity changed (again). */ if (!cpu_isset(dest_cpu, p->cpus_allowed)) goto out; ^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2008-07-10 7:40 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-19 16:19 [BUG] CFS vs cpu hotplug Heiko Carstens 2008-06-19 18:05 ` Peter Zijlstra 2008-06-19 18:14 ` Peter Zijlstra 2008-06-19 21:14 ` Heiko Carstens 2008-06-19 21:26 ` Peter Zijlstra 2008-06-19 21:17 ` Heiko Carstens 2008-06-19 21:32 ` Peter Zijlstra 2008-06-19 21:49 ` Heiko Carstens 2008-06-20 8:51 ` Peter Zijlstra 2008-06-20 22:19 ` Heiko Carstens 2008-06-20 11:44 ` Dmitry Adamushko 2008-06-20 22:23 ` Heiko Carstens 2008-06-25 22:12 ` Dmitry Adamushko 2008-06-28 22:16 ` Dmitry Adamushko 2008-06-29 6:55 ` Ingo Molnar 2008-06-30 9:07 ` Heiko Carstens 2008-06-30 9:17 ` Ingo Molnar 2008-07-01 9:22 ` Lai Jiangshan 2008-07-01 9:31 ` Ingo Molnar 2008-07-01 10:09 ` Lai Jiangshan 2008-07-02 7:13 ` Lai Jiangshan 2008-07-02 8:50 ` Dmitry Adamushko 2008-07-02 9:23 ` Lai Jiangshan 2008-07-07 10:26 ` Miao Xie 2008-07-07 11:31 ` Dmitry Adamushko -- strict thread matches above, loose matches on Subject: below -- 2008-07-09 22:32 Dmitry Adamushko 2008-07-10 7:30 ` Heiko Carstens 2008-07-10 7:39 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox