* [BUG] cpu hotplug vs scheduler @ 2008-05-13 14:33 Avi Kivity 2008-05-13 15:33 ` Avi Kivity 2008-05-14 8:13 ` Dmitry Adamushko 0 siblings, 2 replies; 12+ messages in thread From: Avi Kivity @ 2008-05-13 14:33 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar I'm testing host cpu hotplug with kvm. Basically running 7 guests on a 4 core machine, offlining and onlining host cpus at random. Eventually I hit this: [4298303.496645] Booting processor 3/7 ip 6000 [4298303.506116] Initializing CPU#3 [4298303.506116] Calibrating delay using timer specific routine.. 5319.66 BogoMIPS (lpj=2659833) [4298303.506116] CPU: L1 I cache: 32K, L1 D cache: 32K [4298303.506116] CPU: L2 cache: 4096K [4298303.506116] CPU: Physical Processor ID: 3 [4298303.506116] CPU: Processor Core ID: 1 [4298303.506116] x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106 [4298303.582937] CPU3: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz stepping 06 [4298303.585087] checking TSC synchronization [CPU#0 -> CPU#3]: passed. [4298303.707287] Switched to high resolution mode on CPU 3 [4298303.712943] kvm: enabling virtualization on CPU3 [4298303.713955] CPU0 attaching sched-domain: [4298303.713901] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158 [4298303.713901] IP: [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c [4298303.713901] PGD 0 [4298303.713901] Oops: 0000 [1] PREEMPT SMP [4298303.713901] CPU 3 [4298303.713901] Modules linked in: kvm_intel kvm netconsole autofs4 nfs lockd nfs_acl sunrpc bridge llc acpi_cpufreq backlight sg e1000 button serio_raw rtc_cmos rtc_core rtc_lib ata_piix dm_snapshot dm_mod ahci libata dock sd_mod scsi_mod [last unloaded: kvm] [4298303.713901] Pid: 15115, comm: migration/3 Not tainted 2.6.26-rc2 #723 [4298303.713901] RIP: 0010:[<ffffffff8022e722>] [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c [4298303.713901] RSP: 0018:ffff81004fdfbe20 EFLAGS: 00010046 [4298303.713901] RAX: 0000000000000000 RBX: ffff81000103df80 RCX: 0000000000000000 [4298303.713901] RDX: ffff81000103e038 RSI: 000000003b9aca00 RDI: ffff81000103df00 [4298303.713901] RBP: ffff81004fdfbe40 R08: ffff81004fdfbdd0 R09: ffff81000103a0a0 [4298303.713901] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000000 [4298303.713901] R13: ffff81000103df00 R14: ffffffff8060a140 R15: 0000000000000003 [4298303.713901] FS: 0000000000000000(0000) GS:ffff81007f806a80(0000) knlGS:0000000000000000 [4298303.713901] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [4298303.713901] CR2: 0000000000000158 CR3: 0000000000201000 CR4: 00000000000026a0 [4298303.713901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4298303.713901] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4298303.713901] Process migration/3 (pid: 15115, threadinfo ffff81004fdfa000, task ffff81003f5f0700) [4298303.713901] Stack: ffff81004fdfbe40 ffff81000103df00 ffffffff804543c0 ffff810020985bf8 [4298303.713901] ffff81004fdfbee0 ffffffff804373fe ffff81004fdfbe80 ffffffff8023060a [4298303.713901] ffff81000103df00 ffff81003f5f0700 0000000000000000 ffff81003f5f0a60 [4298303.713901] Call Trace: [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8 [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77 [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12 [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77 [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12 [4298303.713901] [4298303.713901] [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c This seems to be the assignment to cfs_rq after pick_next_entity(). I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm patches. It could be kvm's fault, but it doesn't appear so from the traces. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-13 14:33 [BUG] cpu hotplug vs scheduler Avi Kivity @ 2008-05-13 15:33 ` Avi Kivity 2008-05-13 19:00 ` Heiko Carstens 2008-05-14 8:13 ` Dmitry Adamushko 1 sibling, 1 reply; 12+ messages in thread From: Avi Kivity @ 2008-05-13 15:33 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar Avi Kivity wrote: > I'm testing host cpu hotplug with kvm. Basically running 7 guests on > a 4 core machine, offlining and onlining host cpus at random. > Eventually I hit this: > [snip oops] > This seems to be the assignment to cfs_rq after pick_next_entity(). > > I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm > patches. It could be kvm's fault, but it doesn't appear so from the > traces. > I get the same oops running with the kvm modules unloaded. None of kvm.git's changes are in core code, so this is definitely a mainline problem. It took about 40 minutes to reproduce; 1200 cpus onlined and 1200 cpus offlined. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-13 15:33 ` Avi Kivity @ 2008-05-13 19:00 ` Heiko Carstens 0 siblings, 0 replies; 12+ messages in thread From: Heiko Carstens @ 2008-05-13 19:00 UTC (permalink / raw) To: Avi Kivity; +Cc: linux-kernel, Ingo Molnar On Tue, May 13, 2008 at 06:33:00PM +0300, Avi Kivity wrote: > Avi Kivity wrote: >> I'm testing host cpu hotplug with kvm. Basically running 7 guests on >> a 4 core machine, offlining and onlining host cpus at random. >> Eventually I hit this: >> > > [snip oops] > >> This seems to be the assignment to cfs_rq after pick_next_entity(). >> >> I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm >> patches. It could be kvm's fault, but it doesn't appear so from the >> traces. >> > > I get the same oops running with the kvm modules unloaded. None of > kvm.git's changes are in core code, so this is definitely a mainline > problem. > > It took about 40 minutes to reproduce; 1200 cpus onlined and 1200 cpus > offlined. FWIW, this happens on s390 as well (without kvm support). But until now I didn't have time to look into it. Didn't report it here yet, since we had plenty of cpu hotplug changes in arch code, so I suspected a bug there. This doesn't seem to be the case. It takes only a few minutes to trigger the bug here. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-13 14:33 [BUG] cpu hotplug vs scheduler Avi Kivity 2008-05-13 15:33 ` Avi Kivity @ 2008-05-14 8:13 ` Dmitry Adamushko 2008-05-14 12:30 ` Avi Kivity 2008-05-21 14:48 ` [BUG] hotplug cpus on ia64 Cliff Wickman 1 sibling, 2 replies; 12+ messages in thread From: Dmitry Adamushko @ 2008-05-14 8:13 UTC (permalink / raw) To: Avi Kivity Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 2720 bytes --] Hi, > [ ... ] > > [4298303.713901] Call Trace: > [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab > [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8 > [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf > [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d > [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d > [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77 > [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12 > [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77 > [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12 > [4298303.713901] > [4298303.713901] > [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c > 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff > ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c > [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c > > This seems to be the assignment to cfs_rq after pick_next_entity(). [ cc'ed a few folks. ] So the cfs-tree likely gets out-of-sync. I pressume, it won't be reproducible with CONFIG_SCHED_GROUP options being disabled. Anyway, would you try one of these debug-patches (not sure about the workability of the second one though :-/) Let's check what are the values for 'cfs_rq->weight.load/nr_running'. thanks in advance, (non-whitespace-damaged versions are enclosed) --- --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1291,6 +1291,12 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); + + if (unlikely(!se)) + printk(KERN_ERR "BUG: se == NULL but nr_running (%ld), load (%ld)," + " rq-nr_running (%ld), rq-load (%ld)\n", + cfs_rq->nr_running, cfs_rq->load.weight, rq->nr_running, rq->load.weight); + cfs_rq = group_cfs_rq(se); } while (cfs_rq); --- --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1280,6 +1280,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_str uct *p) resched_task(curr); } +static void sysrq_sched_debug_show(void); + static struct task_struct *pick_next_task_fair(struct rq *rq) { struct task_struct *p; @@ -1291,6 +1293,10 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); + + if (unlikely(!se)) + sysrq_sched_debug_show(); + cfs_rq = group_cfs_rq(se); } while (cfs_rq); --- -- Best regards, Dmitry Adamushko [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: cfs_rq-debug-2.patch --] [-- Type: text/x-patch; name=cfs_rq-debug-2.patch, Size: 630 bytes --] diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index e24ecd3..1dcc470 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1280,6 +1280,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p) resched_task(curr); } +static void sysrq_sched_debug_show(void); + static struct task_struct *pick_next_task_fair(struct rq *rq) { struct task_struct *p; @@ -1291,6 +1293,10 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); + + if (unlikely(!se)) + sysrq_sched_debug_show(); + cfs_rq = group_cfs_rq(se); } while (cfs_rq); [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: cfs_rq-debug-3.patch --] [-- Type: text/x-patch; name=cfs_rq-debug-3.patch, Size: 542 bytes --] diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index e24ecd3..e21e020 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1291,6 +1291,12 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); + + if (unlikely(!se)) + printk(KERN_ERR "BUG: se == NULL but nr_running (%ld), load (%ld)," + " rq-nr_running (%ld), rq-load (%ld)\n", + cfs_rq->nr_running, cfs_rq->load.weight, rq->nr_running, rq->load.weight); + cfs_rq = group_cfs_rq(se); } while (cfs_rq); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-14 8:13 ` Dmitry Adamushko @ 2008-05-14 12:30 ` Avi Kivity 2008-05-14 13:05 ` Dmitry Adamushko 2008-05-21 14:48 ` [BUG] hotplug cpus on ia64 Cliff Wickman 1 sibling, 1 reply; 12+ messages in thread From: Avi Kivity @ 2008-05-14 12:30 UTC (permalink / raw) To: Dmitry Adamushko Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Srivatsa Vaddagiri Dmitry Adamushko wrote: > Hi, > > >> [ ... ] >> >> [4298303.713901] Call Trace: >> [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab >> [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8 >> [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf >> [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d >> [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d >> [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77 >> [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12 >> [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77 >> [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12 >> [4298303.713901] >> [4298303.713901] >> [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c >> 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff >> ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c >> [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c >> >> This seems to be the assignment to cfs_rq after pick_next_entity(). >> > > [ cc'ed a few folks. ] > > > So the cfs-tree likely gets out-of-sync. I pressume, it won't be > reproducible with CONFIG_SCHED_GROUP options being disabled. > > Anyway, would you try one of these debug-patches (not sure about the > workability of the second one though :-/) > > Let's check what are the values for 'cfs_rq->weight.load/nr_running'. > > Got this for the first patch: [4302727.615522] Booting processor 3/7 ip 6000 [4302727.625923] Initializing CPU#3 [4302727.625923] Calibrating delay using timer specific routine.. 5319.76 BogoMIPS (lpj=2659883) [4302727.625923] CPU: L1 I cache: 32K, L1 D cache: 32K [4302727.625923] CPU: L2 cache: 4096K [4302727.625923] CPU: Physical Processor ID: 3 [4302727.625923] CPU: Processor Core ID: 1 [4302727.625923] x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106 [4302727.692484] CPU3: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz stepping 06 [4302727.694236] checking TSC synchronization [CPU#1 -> CPU#3]: passed. [4302727.824185] Switched to high resolution mode on CPU 3 [4302727.859184] kvm: enabling virtualization on CPU3 [4302727.859714] Sched Debug Version: v0.07, 2.6.26-rc2 #726 [4302727.859714] now at 6918576.148656 msecs [4302727.859714] .sysctl_sched_latency : 60.000000 [4302727.859714] .sysctl_sched_min_granularity : 12.000000 [4302727.859714] .sysctl_sched_wakeup_granularity : 30.000000 [4302727.859714] .sysctl_sched_child_runs_first : 0.000001 [4302727.860191] .sysctl_sched_features : 895 [4302727.860191] [4302727.860191] cpu#0, 2659.999 MHz [4302727.860191] .nr_running : 2 [4302727.860191] .load : 841 [4302727.860191] .nr_switches : 3427530 [4302727.861205] .nr_load_updates : 2183358 [4302727.861205] .nr_uninterruptible : 15 [4302727.861205] .jiffies : 4301585875 [4302727.861205] .next_balance : 4301.585696 [4302727.861205] .curr->pid : 4678 [4302727.861205] .clock : 6918579.002757 [4302727.862216] .cpu_load[0] : 841 [4302727.862216] .cpu_load[1] : 841 [4302727.862216] .cpu_load[2] : 841 [4302727.862216] .cpu_load[3] : 841 [4302727.862216] .cpu_load[4] : 841 [4302727.862216] [4302727.862216] cfs_rq[0]: [4302727.867209] .exec_clock : 3970.569663 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178969.408050 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 6 [4302727.867209] .shares : 1024 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 14.588517 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178971.405628 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 1024 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 41.615870 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178973.403544 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 2 [4302727.867209] .shares : 1024 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 0.000000 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178975.401320 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 0.000001 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178977.398314 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 2165242.484786 [4302727.867209] .MIN_vruntime : 10323214.742376 [4302727.867209] .min_vruntime : 5178979.396488 [4302727.867209] .max_vruntime : 10323214.742376 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 2 [4302727.867209] .load : 2048 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 1789825 [4302727.867209] .shares : 843 [4302727.867209] [4302727.867209] runnable tasks: [4302727.867209] task PID tree-key switches prio exec-runtime sum-exec sum-sleep [4302727.867209] ---------------------------------------------------------------------------------------------------------- [4302727.867209] Rqemu-system-x86 4678 10323337.578253 553310 120 10323337.578255 1380505.796830 42439.250368 [4302727.867209] [4302727.867209] cpu#1, 2659.999 MHz [4302727.867209] .nr_running : 3 [4302727.867209] .load : 415 [4302727.867209] .nr_switches : 629498 [4302727.867209] .nr_load_updates : 838874 [4302727.867209] .nr_uninterruptible : -6 [4302727.867209] .jiffies : 4301585895 [4302727.867209] .next_balance : 4301.585634 [4302727.867209] .curr->pid : 7799 [4302727.867209] .clock : 6918576.130865 [4302727.867209] .cpu_load[0] : 415 [4302727.867209] .cpu_load[1] : 415 [4302727.867209] .cpu_load[2] : 415 [4302727.867209] .cpu_load[3] : 415 [4302727.867209] .cpu_load[4] : 415 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 74.637431 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419588.520858 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 1 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 22.707771 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419590.518446 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 1 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 0.033026 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419590.518446 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 0.000000 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419590.518446 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 0.026450 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419590.518446 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867210] cfs_rq[1]: [4302727.867210] .exec_clock : 754981.092689 [4302727.867210] .MIN_vruntime : 1239813.449102 [4302727.867210] .min_vruntime : 759396.868495 [4302727.867210] .max_vruntime : 1239819.334711 [4302727.867210] .spread : 5.885609 [4302727.867210] .spread0 : -4419590.518446 [4302727.867210] .nr_running : 3 [4302727.867210] .load : 3072 [4302727.867210] .bkl_count : 299 [4302727.867210] .nr_spread_over : 53817 [4302727.867210] .shares : 415 [4302727.867210] [4302727.867210] runnable tasks: [4302727.867210] task PID tree-key switches prio exec-runtime sum-exec sum-sleep [4302727.867210] ---------------------------------------------------------------------------------------------------------- [4302727.900184] qemu-system-x86 4987 1239813.449102 709828 120 1239813.449102 1410504.949783 22865.206608 [4302727.900184] qemu-system-x86 5052 1239819.334711 530481 120 1239819.334711 1365146.519564 50937.064744 [4302727.900184] Rtoggle-processo 7799 1239811.208673 47886 120 1239811.208673 57552.631854 1592798.974913 [4302727.900184] [4302727.900184] cpu#3, 2659.999 MHz [4302727.900184] .nr_running : 1 [4302727.900184] .load : 285 [4302727.900184] .nr_switches : 611209 [4302727.900184] .nr_load_updates : 843051 [4302727.900184] .nr_uninterruptible : -2 [4302727.900184] .jiffies : 4301585916 [4302727.900184] .next_balance : 4301.586873 [4302727.900184] .curr->pid : 0 [4302727.900184] .clock : 6918576.376068 [4302727.900184] .cpu_load[0] : 0 [4302727.900184] .cpu_load[1] : 0 [4302727.900184] .cpu_load[2] : 0 [4302727.900184] .cpu_load[3] : 181 [4302727.900184] .cpu_load[4] : 1108 [4302727.900184] [4302727.900184] cfs_rq[3]: [4302727.900184] .exec_clock : 8.224765 [4302727.900184] BUG: spinlock recursion on CPU#3, swapper/0 [4302727.900184] lock: ffff81000103df00, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 3 [4302727.900184] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #726 [4302727.900184] [4302727.900184] Call Trace: [4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9 [4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123 [4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37 [4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a [4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c [4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf [4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86 [4302727.900184] [<ffffffff804373f6>] schedule+0x3fc/0x6ab [4302727.900184] [<ffffffff8024e03f>] ? ktime_get_ts+0x49/0x4e [4302727.900184] [<ffffffff80253a28>] ? tick_nohz_stop_idle+0x2d/0x54 [4302727.900184] [<ffffffff8021283f>] ? mwait_idle+0x0/0x59 [4302727.900184] [<ffffffff8020ae37>] cpu_idle+0xc8/0xd7 [4302727.900184] [<ffffffff804332f1>] start_secondary+0x173/0x178 [4302727.900184] -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-14 12:30 ` Avi Kivity @ 2008-05-14 13:05 ` Dmitry Adamushko 2008-05-15 10:19 ` Avi Kivity 0 siblings, 1 reply; 12+ messages in thread From: Dmitry Adamushko @ 2008-05-14 13:05 UTC (permalink / raw) To: Avi Kivity Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 2368 bytes --] 2008/5/14 Avi Kivity <avi@qumranet.com>: > [ ... ] > > [4302727.900184] Call Trace: > [4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9 > [4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123 > [4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37 > [4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a > [4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c > [4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf > [4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86 Err... sorry for the broken patch. The patch below on top of the previous one should address this issue (ugly, but should be ok for debugging). 'tasklist_lock' shouldn't cause a double lock, I guess. Sorry for rather 'blind' attempts. If no, then I'll prepare/test/take-a-closer-look at it later today when I'm at home. TIA, ------ kernel/sched_debug-prev.c 2008-05-14 14:53:28.000000000 +0200 +++ kernel/sched_debug.c 2008-05-14 14:58:12.000000000 +0200 @@ -125,6 +125,7 @@ void print_cfs_rq(struct seq_file *m, in char path[128] = ""; struct cgroup *cgroup = NULL; struct task_group *tg = cfs_rq->tg; + int was_locked; if (tg) cgroup = tg->css.cgroup; @@ -138,7 +139,11 @@ void print_cfs_rq(struct seq_file *m, in SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "exec_clock", SPLIT_NS(cfs_rq->exec_clock)); - spin_lock_irqsave(&rq->lock, flags); + was_locked = spin_is_locked(&rq->lock); + + if (!was_locked) + spin_lock_irqsave(&rq->lock, flags); + if (cfs_rq->rb_leftmost) MIN_vruntime = (__pick_next_entity(cfs_rq))->vruntime; last = __pick_last_entity(cfs_rq); @@ -146,7 +151,10 @@ void print_cfs_rq(struct seq_file *m, in max_vruntime = last->vruntime; min_vruntime = rq->cfs.min_vruntime; rq0_min_vruntime = per_cpu(runqueues, 0).cfs.min_vruntime; - spin_unlock_irqrestore(&rq->lock, flags); + + if (!was_locked) + spin_unlock_irqrestore(&rq->lock, flags); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "MIN_vruntime", SPLIT_NS(MIN_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime", --- -- Best regards, Dmitry Adamushko [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: is_locked.diff --] [-- Type: text/x-patch; name=is_locked.diff, Size: 1211 bytes --] --- kernel/sched_debug-prev.c 2008-05-14 14:53:28.000000000 +0200 +++ kernel/sched_debug.c 2008-05-14 14:58:12.000000000 +0200 @@ -125,6 +125,7 @@ void print_cfs_rq(struct seq_file *m, in char path[128] = ""; struct cgroup *cgroup = NULL; struct task_group *tg = cfs_rq->tg; + int was_locked; if (tg) cgroup = tg->css.cgroup; @@ -138,7 +139,11 @@ void print_cfs_rq(struct seq_file *m, in SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "exec_clock", SPLIT_NS(cfs_rq->exec_clock)); - spin_lock_irqsave(&rq->lock, flags); + was_locked = spin_is_locked(&rq->lock); + + if (!was_locked) + spin_lock_irqsave(&rq->lock, flags); + if (cfs_rq->rb_leftmost) MIN_vruntime = (__pick_next_entity(cfs_rq))->vruntime; last = __pick_last_entity(cfs_rq); @@ -146,7 +151,10 @@ void print_cfs_rq(struct seq_file *m, in max_vruntime = last->vruntime; min_vruntime = rq->cfs.min_vruntime; rq0_min_vruntime = per_cpu(runqueues, 0).cfs.min_vruntime; - spin_unlock_irqrestore(&rq->lock, flags); + + if (!was_locked) + spin_unlock_irqrestore(&rq->lock, flags); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "MIN_vruntime", SPLIT_NS(MIN_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime", ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-14 13:05 ` Dmitry Adamushko @ 2008-05-15 10:19 ` Avi Kivity 2008-05-21 12:31 ` Heiko Carstens 0 siblings, 1 reply; 12+ messages in thread From: Avi Kivity @ 2008-05-15 10:19 UTC (permalink / raw) To: Dmitry Adamushko Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Srivatsa Vaddagiri Dmitry Adamushko wrote: > 2008/5/14 Avi Kivity <avi@qumranet.com>: > > >> [ ... ] >> >> [4302727.900184] Call Trace: >> [4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9 >> [4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123 >> [4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37 >> [4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a >> [4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c >> [4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf >> [4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86 >> > > > Err... sorry for the broken patch. The patch below on top of the > previous one should address this issue (ugly, but should be ok for > debugging). > 'tasklist_lock' shouldn't cause a double lock, I guess. > > Sorry for rather 'blind' attempts. If no, then I'll > prepare/test/take-a-closer-look at it later today when I'm at home. > > Now it died when allocating the pda: [4301986.526627] Booting processor 1/6 ip 6000 [4301986.537195] swapper: page allocation failure. order:2, mode:0x20 [4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728 [4301986.537195] [4301986.537195] Call Trace: [4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb [4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f [4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd [4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46 [4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac [4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a [4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47 [4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad [4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178 [4301986.537195] [4301986.537195] Mem-info: [4301986.537195] DMA per-cpu: [4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0 [4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0 [4301986.537195] DMA32 per-cpu: [4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153 [4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135 [4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0 unstable:0 [4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0 [4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no [4301986.537195] lowmem_reserve[]: 0 1999 1999 1999 [4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0 all_unreclaimable? no [4301986.537195] lowmem_reserve[]: 0 0 0 0 [4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB 1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB [4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB [4301986.537195] 212750 total pagecache pages [4301986.537195] Swap cache: add 73, delete 18, find 0/0 [4301986.537195] Free swap = 3210964kB [4301986.537195] Total swap = 3211256kB There seem to be some order-2 allocations available, though. And in any case, cpu hotplug should't crash on such failures, only fail. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-15 10:19 ` Avi Kivity @ 2008-05-21 12:31 ` Heiko Carstens 2008-05-21 12:42 ` Avi Kivity 0 siblings, 1 reply; 12+ messages in thread From: Heiko Carstens @ 2008-05-21 12:31 UTC (permalink / raw) To: Avi Kivity Cc: Dmitry Adamushko, linux-kernel, Ingo Molnar, Peter Zijlstra, Srivatsa Vaddagiri > Now it died when allocating the pda: > > [4301986.526627] Booting processor 1/6 ip 6000 > [4301986.537195] swapper: page allocation failure. order:2, mode:0x20 > [4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728 > [4301986.537195] > [4301986.537195] Call Trace: > [4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb > [4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f > [4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd > [4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46 > [4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac > [4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a > [4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47 > [4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad > [4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178 > [4301986.537195] > [4301986.537195] Mem-info: > [4301986.537195] DMA per-cpu: > [4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0 > [4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0 > [4301986.537195] DMA32 per-cpu: > [4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153 > [4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135 > [4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0 > unstable:0 > [4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0 > [4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB > inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no > [4301986.537195] lowmem_reserve[]: 0 1999 1999 1999 > [4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB > active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0 > all_unreclaimable? no > [4301986.537195] lowmem_reserve[]: 0 0 0 0 > [4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB > 1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB > [4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB > 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB > [4301986.537195] 212750 total pagecache pages > [4301986.537195] Swap cache: add 73, delete 18, find 0/0 > [4301986.537195] Free swap = 3210964kB > [4301986.537195] Total swap = 3211256kB > > There seem to be some order-2 allocations available, though. And in any > case, cpu hotplug should't crash on such failures, only fail. Hi Avi, I think you fixed this specific bug. Does cpu hotplug stresstest now work for you? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-21 12:31 ` Heiko Carstens @ 2008-05-21 12:42 ` Avi Kivity 2008-05-21 12:55 ` Heiko Carstens 0 siblings, 1 reply; 12+ messages in thread From: Avi Kivity @ 2008-05-21 12:42 UTC (permalink / raw) To: Heiko Carstens Cc: Dmitry Adamushko, linux-kernel, Ingo Molnar, Peter Zijlstra, Srivatsa Vaddagiri Heiko Carstens wrote: >> Now it died when allocating the pda: >> >> [4301986.526627] Booting processor 1/6 ip 6000 >> [4301986.537195] swapper: page allocation failure. order:2, mode:0x20 >> [4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728 >> [4301986.537195] >> [4301986.537195] Call Trace: >> [4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb >> [4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f >> [4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd >> [4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46 >> [4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac >> [4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a >> [4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47 >> [4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad >> [4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178 >> [4301986.537195] >> [4301986.537195] Mem-info: >> [4301986.537195] DMA per-cpu: >> [4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0 >> [4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0 >> [4301986.537195] DMA32 per-cpu: >> [4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153 >> [4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135 >> [4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0 >> unstable:0 >> [4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0 >> [4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB >> inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no >> [4301986.537195] lowmem_reserve[]: 0 1999 1999 1999 >> [4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB >> active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0 >> all_unreclaimable? no >> [4301986.537195] lowmem_reserve[]: 0 0 0 0 >> [4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB >> 1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB >> [4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB >> 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB >> [4301986.537195] 212750 total pagecache pages >> [4301986.537195] Swap cache: add 73, delete 18, find 0/0 >> [4301986.537195] Free swap = 3210964kB >> [4301986.537195] Total swap = 3211256kB >> >> There seem to be some order-2 allocations available, though. And in any >> case, cpu hotplug should't crash on such failures, only fail. >> > > Hi Avi, > > I think you fixed this specific bug. Does cpu hotplug stresstest now work > for you? > Which bug, the order-2 allocs? I certainly didn't fix it. Last I tried, cpu hotplug still failed, but it was some time ago as patches flow. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-21 12:42 ` Avi Kivity @ 2008-05-21 12:55 ` Heiko Carstens 2008-05-21 13:03 ` Avi Kivity 0 siblings, 1 reply; 12+ messages in thread From: Heiko Carstens @ 2008-05-21 12:55 UTC (permalink / raw) To: Avi Kivity Cc: Dmitry Adamushko, linux-kernel, Ingo Molnar, Peter Zijlstra, Srivatsa Vaddagiri On Wed, May 21, 2008 at 03:42:51PM +0300, Avi Kivity wrote: > Heiko Carstens wrote: >> I think you fixed this specific bug. Does cpu hotplug stresstest now work >> for you? > > Which bug, the order-2 allocs? I certainly didn't fix it. > > Last I tried, cpu hotplug still failed, but it was some time ago as > patches flow. Ah no. I thought "x86: fix crash on cpu hotplug on pat-incapable machines" would have fixed the cpu hotplug bug for you. I was actually talking of the pick_next_task_fair incarnation of one of the cpu hotplug bugs. s390 backtrace on 2.6.25 looks like this. <1>Unable to handle kernel pointer dereference at virtual kernel address 00000200000db000 <4>Oops: 003b [#1] PREEMPT SMP <4>Modules linked in: dm_multipath sunrpc qeth_l2 dm_mod cu3088 qeth ccwgroup <4>CPU: 0 Not tainted 2.6.25-27.x.20080422-s390xdefault #1 <4>Process migration/0 (pid: 5806, task: 00000000aac3a838, ksp: 000000017a2f3dd0) <4>Krnl PSW : 0400200180000000 0000000000125f48 (pick_next_task_fair+0x34/0xd4) <4> R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 EA:3 <4>Krnl GPRS: 0000079e029c6ecb 0000000000000000 000000008548ea60 000000008548d700 <4> 00000000004526f8 0000000000000597 0000000000000000 0000000000000000 <4> 0000000000672a00 0000000000000000 00000000aac3a838 000000017a2f3d10 <4> 00000200000db85a 00000000004672a0 000000017a2f3d50 000000017a2f3d10 <4>Krnl Code: 0000000000125f38: e3e0f0980024 stg %r14,152(%r15) <4> 0000000000125f3e: d507d000c010 clc 0(8,%r13),16(%r12) <4> 0000000000125f44: a784004e brc 8,125fe0 <4> >0000000000125f48: d507d000c030 clc 0(8,%r13),48(%r12) <4> 0000000000125f4e: b904002c lgr %r2,%r12 <4> 0000000000125f52: a7990000 lghi %r9,0 <4> 0000000000125f56: a7840033 brc 8,125fbc <4> 0000000000125f5a: c0e5ffffee23 brasl %r14,123ba0 <4>Call Trace: <4>([<00000000aac3a838>] 0xaac3a838) <4> [<000000000045270c>] schedule+0x178/0x978 <4> [<000000000012fe86>] migration_thread+0x1aa/0x30c <4> [<0000000000152ad8>] kthread+0x68/0xa0 <4> [<000000000010a0be>] kernel_thread_starter+0x6/0xc <4> [<000000000010a0b8>] kernel_thread_starter+0x0/0xc <4>Last Breaking-Event-Address: <4> [<0000000000125fc6>] pick_next_task_fair+0xb2/0xd4 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-21 12:55 ` Heiko Carstens @ 2008-05-21 13:03 ` Avi Kivity 0 siblings, 0 replies; 12+ messages in thread From: Avi Kivity @ 2008-05-21 13:03 UTC (permalink / raw) To: Heiko Carstens Cc: Dmitry Adamushko, linux-kernel, Ingo Molnar, Peter Zijlstra, Srivatsa Vaddagiri Heiko Carstens wrote: > On Wed, May 21, 2008 at 03:42:51PM +0300, Avi Kivity wrote: > >> Heiko Carstens wrote: >> >>> I think you fixed this specific bug. Does cpu hotplug stresstest now work >>> for you? >>> >> Which bug, the order-2 allocs? I certainly didn't fix it. >> >> Last I tried, cpu hotplug still failed, but it was some time ago as >> patches flow. >> > > Ah no. I thought "x86: fix crash on cpu hotplug on pat-incapable machines" > would have fixed the cpu hotplug bug for you. > > No, that only fixes an immediate oops when running in a virtual machine (which doesn't have pat). > I was actually talking of the pick_next_task_fair incarnation of one of > the cpu hotplug bugs. > > That hasn't been addressed, AFAIK. > s390 backtrace on 2.6.25 looks like this. > > <1>Unable to handle kernel pointer dereference at virtual kernel address 00000200000db000 > <4>Oops: 003b [#1] PREEMPT SMP > <4>Modules linked in: dm_multipath sunrpc qeth_l2 dm_mod cu3088 qeth ccwgroup > <4>CPU: 0 Not tainted 2.6.25-27.x.20080422-s390xdefault #1 > <4>Process migration/0 (pid: 5806, task: 00000000aac3a838, ksp: 000000017a2f3dd0) > <4>Krnl PSW : 0400200180000000 0000000000125f48 (pick_next_task_fair+0x34/0xd4) > <4> R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > <4>Krnl GPRS: 0000079e029c6ecb 0000000000000000 000000008548ea60 000000008548d700 > <4> 00000000004526f8 0000000000000597 0000000000000000 0000000000000000 > <4> 0000000000672a00 0000000000000000 00000000aac3a838 000000017a2f3d10 > <4> 00000200000db85a 00000000004672a0 000000017a2f3d50 000000017a2f3d10 > <4>Krnl Code: 0000000000125f38: e3e0f0980024 stg %r14,152(%r15) > <4> 0000000000125f3e: d507d000c010 clc 0(8,%r13),16(%r12) > <4> 0000000000125f44: a784004e brc 8,125fe0 > <4> >0000000000125f48: d507d000c030 clc 0(8,%r13),48(%r12) > <4> 0000000000125f4e: b904002c lgr %r2,%r12 > <4> 0000000000125f52: a7990000 lghi %r9,0 > <4> 0000000000125f56: a7840033 brc 8,125fbc > <4> 0000000000125f5a: c0e5ffffee23 brasl %r14,123ba0 > <4>Call Trace: > <4>([<00000000aac3a838>] 0xaac3a838) > <4> [<000000000045270c>] schedule+0x178/0x978 > <4> [<000000000012fe86>] migration_thread+0x1aa/0x30c > <4> [<0000000000152ad8>] kthread+0x68/0xa0 > <4> [<000000000010a0be>] kernel_thread_starter+0x6/0xc > <4> [<000000000010a0b8>] kernel_thread_starter+0x0/0xc > <4>Last Breaking-Event-Address: > <4> [<0000000000125fc6>] pick_next_task_fair+0xb2/0xd4 > Seems to be the same bug, yes. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* [BUG] hotplug cpus on ia64 2008-05-14 8:13 ` Dmitry Adamushko 2008-05-14 12:30 ` Avi Kivity @ 2008-05-21 14:48 ` Cliff Wickman 1 sibling, 0 replies; 12+ messages in thread From: Cliff Wickman @ 2008-05-21 14:48 UTC (permalink / raw) To: dmitry.adamushko, avi, mingo, heiko.carstens, a.p.zijlstra, vatsa Cc: Avi Kivity, linux-kernel, Ingo Molnar, Heiko Carstens, PeterZijlst Gentlemen, I built an ia64 kernel from Andrew's tree (2.6.26-rc2-mm1) and get a very predictable hotplug cpu problem. billberry1:/tmp/cpw # ./dis disabled cpu 17 enabled cpu 17 billberry1:/tmp/cpw # ./dis disabled cpu 17 enabled cpu 17 billberry1:/tmp/cpw # ./dis The script that disables the cpu always hangs (unkillable) on the 3rd attempt. I haven't spent any debugging time on it yet. Just wondering if you've seen it? It doesn't seem to happen x86_64. -Cliff Wickman cpw@sgi.com ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-05-21 14:45 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-13 14:33 [BUG] cpu hotplug vs scheduler Avi Kivity 2008-05-13 15:33 ` Avi Kivity 2008-05-13 19:00 ` Heiko Carstens 2008-05-14 8:13 ` Dmitry Adamushko 2008-05-14 12:30 ` Avi Kivity 2008-05-14 13:05 ` Dmitry Adamushko 2008-05-15 10:19 ` Avi Kivity 2008-05-21 12:31 ` Heiko Carstens 2008-05-21 12:42 ` Avi Kivity 2008-05-21 12:55 ` Heiko Carstens 2008-05-21 13:03 ` Avi Kivity 2008-05-21 14:48 ` [BUG] hotplug cpus on ia64 Cliff Wickman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox