* [BUG] cpu hotplug vs scheduler @ 2008-05-13 14:33 Avi Kivity 2008-05-13 15:33 ` Avi Kivity 2008-05-14 8:13 ` Dmitry Adamushko 0 siblings, 2 replies; 12+ messages in thread From: Avi Kivity @ 2008-05-13 14:33 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar I'm testing host cpu hotplug with kvm. Basically running 7 guests on a 4 core machine, offlining and onlining host cpus at random. Eventually I hit this: [4298303.496645] Booting processor 3/7 ip 6000 [4298303.506116] Initializing CPU#3 [4298303.506116] Calibrating delay using timer specific routine.. 5319.66 BogoMIPS (lpj=2659833) [4298303.506116] CPU: L1 I cache: 32K, L1 D cache: 32K [4298303.506116] CPU: L2 cache: 4096K [4298303.506116] CPU: Physical Processor ID: 3 [4298303.506116] CPU: Processor Core ID: 1 [4298303.506116] x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106 [4298303.582937] CPU3: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz stepping 06 [4298303.585087] checking TSC synchronization [CPU#0 -> CPU#3]: passed. [4298303.707287] Switched to high resolution mode on CPU 3 [4298303.712943] kvm: enabling virtualization on CPU3 [4298303.713955] CPU0 attaching sched-domain: [4298303.713901] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158 [4298303.713901] IP: [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c [4298303.713901] PGD 0 [4298303.713901] Oops: 0000 [1] PREEMPT SMP [4298303.713901] CPU 3 [4298303.713901] Modules linked in: kvm_intel kvm netconsole autofs4 nfs lockd nfs_acl sunrpc bridge llc acpi_cpufreq backlight sg e1000 button serio_raw rtc_cmos rtc_core rtc_lib ata_piix dm_snapshot dm_mod ahci libata dock sd_mod scsi_mod [last unloaded: kvm] [4298303.713901] Pid: 15115, comm: migration/3 Not tainted 2.6.26-rc2 #723 [4298303.713901] RIP: 0010:[<ffffffff8022e722>] [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c [4298303.713901] RSP: 0018:ffff81004fdfbe20 EFLAGS: 00010046 [4298303.713901] RAX: 0000000000000000 RBX: ffff81000103df80 RCX: 0000000000000000 [4298303.713901] RDX: ffff81000103e038 RSI: 000000003b9aca00 RDI: ffff81000103df00 [4298303.713901] RBP: ffff81004fdfbe40 R08: ffff81004fdfbdd0 R09: ffff81000103a0a0 [4298303.713901] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000000 [4298303.713901] R13: ffff81000103df00 R14: ffffffff8060a140 R15: 0000000000000003 [4298303.713901] FS: 0000000000000000(0000) GS:ffff81007f806a80(0000) knlGS:0000000000000000 [4298303.713901] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [4298303.713901] CR2: 0000000000000158 CR3: 0000000000201000 CR4: 00000000000026a0 [4298303.713901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4298303.713901] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4298303.713901] Process migration/3 (pid: 15115, threadinfo ffff81004fdfa000, task ffff81003f5f0700) [4298303.713901] Stack: ffff81004fdfbe40 ffff81000103df00 ffffffff804543c0 ffff810020985bf8 [4298303.713901] ffff81004fdfbee0 ffffffff804373fe ffff81004fdfbe80 ffffffff8023060a [4298303.713901] ffff81000103df00 ffff81003f5f0700 0000000000000000 ffff81003f5f0a60 [4298303.713901] Call Trace: [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8 [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77 [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12 [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77 [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12 [4298303.713901] [4298303.713901] [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c This seems to be the assignment to cfs_rq after pick_next_entity(). I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm patches. It could be kvm's fault, but it doesn't appear so from the traces. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-13 14:33 [BUG] cpu hotplug vs scheduler Avi Kivity @ 2008-05-13 15:33 ` Avi Kivity 2008-05-13 19:00 ` Heiko Carstens 2008-05-14 8:13 ` Dmitry Adamushko 1 sibling, 1 reply; 12+ messages in thread From: Avi Kivity @ 2008-05-13 15:33 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar Avi Kivity wrote: > I'm testing host cpu hotplug with kvm. Basically running 7 guests on > a 4 core machine, offlining and onlining host cpus at random. > Eventually I hit this: > [snip oops] > This seems to be the assignment to cfs_rq after pick_next_entity(). > > I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm > patches. It could be kvm's fault, but it doesn't appear so from the > traces. > I get the same oops running with the kvm modules unloaded. None of kvm.git's changes are in core code, so this is definitely a mainline problem. It took about 40 minutes to reproduce; 1200 cpus onlined and 1200 cpus offlined. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-13 15:33 ` Avi Kivity @ 2008-05-13 19:00 ` Heiko Carstens 0 siblings, 0 replies; 12+ messages in thread From: Heiko Carstens @ 2008-05-13 19:00 UTC (permalink / raw) To: Avi Kivity; +Cc: linux-kernel, Ingo Molnar On Tue, May 13, 2008 at 06:33:00PM +0300, Avi Kivity wrote: > Avi Kivity wrote: >> I'm testing host cpu hotplug with kvm. Basically running 7 guests on >> a 4 core machine, offlining and onlining host cpus at random. >> Eventually I hit this: >> > > [snip oops] > >> This seems to be the assignment to cfs_rq after pick_next_entity(). >> >> I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm >> patches. It could be kvm's fault, but it doesn't appear so from the >> traces. >> > > I get the same oops running with the kvm modules unloaded. None of > kvm.git's changes are in core code, so this is definitely a mainline > problem. > > It took about 40 minutes to reproduce; 1200 cpus onlined and 1200 cpus > offlined. FWIW, this happens on s390 as well (without kvm support). But until now I didn't have time to look into it. Didn't report it here yet, since we had plenty of cpu hotplug changes in arch code, so I suspected a bug there. This doesn't seem to be the case. It takes only a few minutes to trigger the bug here. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-13 14:33 [BUG] cpu hotplug vs scheduler Avi Kivity 2008-05-13 15:33 ` Avi Kivity @ 2008-05-14 8:13 ` Dmitry Adamushko 2008-05-14 12:30 ` Avi Kivity 2008-05-21 14:48 ` [BUG] hotplug cpus on ia64 Cliff Wickman 1 sibling, 2 replies; 12+ messages in thread From: Dmitry Adamushko @ 2008-05-14 8:13 UTC (permalink / raw) To: Avi Kivity Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 2720 bytes --] Hi, > [ ... ] > > [4298303.713901] Call Trace: > [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab > [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8 > [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf > [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d > [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d > [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77 > [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12 > [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77 > [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12 > [4298303.713901] > [4298303.713901] > [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c > 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff > ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c > [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c > > This seems to be the assignment to cfs_rq after pick_next_entity(). [ cc'ed a few folks. ] So the cfs-tree likely gets out-of-sync. I pressume, it won't be reproducible with CONFIG_SCHED_GROUP options being disabled. Anyway, would you try one of these debug-patches (not sure about the workability of the second one though :-/) Let's check what are the values for 'cfs_rq->weight.load/nr_running'. thanks in advance, (non-whitespace-damaged versions are enclosed) --- --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1291,6 +1291,12 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); + + if (unlikely(!se)) + printk(KERN_ERR "BUG: se == NULL but nr_running (%ld), load (%ld)," + " rq-nr_running (%ld), rq-load (%ld)\n", + cfs_rq->nr_running, cfs_rq->load.weight, rq->nr_running, rq->load.weight); + cfs_rq = group_cfs_rq(se); } while (cfs_rq); --- --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1280,6 +1280,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_str uct *p) resched_task(curr); } +static void sysrq_sched_debug_show(void); + static struct task_struct *pick_next_task_fair(struct rq *rq) { struct task_struct *p; @@ -1291,6 +1293,10 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); + + if (unlikely(!se)) + sysrq_sched_debug_show(); + cfs_rq = group_cfs_rq(se); } while (cfs_rq); --- -- Best regards, Dmitry Adamushko [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: cfs_rq-debug-2.patch --] [-- Type: text/x-patch; name=cfs_rq-debug-2.patch, Size: 630 bytes --] diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index e24ecd3..1dcc470 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1280,6 +1280,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p) resched_task(curr); } +static void sysrq_sched_debug_show(void); + static struct task_struct *pick_next_task_fair(struct rq *rq) { struct task_struct *p; @@ -1291,6 +1293,10 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); + + if (unlikely(!se)) + sysrq_sched_debug_show(); + cfs_rq = group_cfs_rq(se); } while (cfs_rq); [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: cfs_rq-debug-3.patch --] [-- Type: text/x-patch; name=cfs_rq-debug-3.patch, Size: 542 bytes --] diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index e24ecd3..e21e020 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1291,6 +1291,12 @@ static struct task_struct *pick_next_task_fair(struct rq *rq) do { se = pick_next_entity(cfs_rq); + + if (unlikely(!se)) + printk(KERN_ERR "BUG: se == NULL but nr_running (%ld), load (%ld)," + " rq-nr_running (%ld), rq-load (%ld)\n", + cfs_rq->nr_running, cfs_rq->load.weight, rq->nr_running, rq->load.weight); + cfs_rq = group_cfs_rq(se); } while (cfs_rq); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-14 8:13 ` Dmitry Adamushko @ 2008-05-14 12:30 ` Avi Kivity 2008-05-14 13:05 ` Dmitry Adamushko 2008-05-21 14:48 ` [BUG] hotplug cpus on ia64 Cliff Wickman 1 sibling, 1 reply; 12+ messages in thread From: Avi Kivity @ 2008-05-14 12:30 UTC (permalink / raw) To: Dmitry Adamushko Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Srivatsa Vaddagiri Dmitry Adamushko wrote: > Hi, > > >> [ ... ] >> >> [4298303.713901] Call Trace: >> [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab >> [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8 >> [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf >> [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d >> [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d >> [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77 >> [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12 >> [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77 >> [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12 >> [4298303.713901] >> [4298303.713901] >> [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c >> 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff >> ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c >> [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c >> >> This seems to be the assignment to cfs_rq after pick_next_entity(). >> > > [ cc'ed a few folks. ] > > > So the cfs-tree likely gets out-of-sync. I pressume, it won't be > reproducible with CONFIG_SCHED_GROUP options being disabled. > > Anyway, would you try one of these debug-patches (not sure about the > workability of the second one though :-/) > > Let's check what are the values for 'cfs_rq->weight.load/nr_running'. > > Got this for the first patch: [4302727.615522] Booting processor 3/7 ip 6000 [4302727.625923] Initializing CPU#3 [4302727.625923] Calibrating delay using timer specific routine.. 5319.76 BogoMIPS (lpj=2659883) [4302727.625923] CPU: L1 I cache: 32K, L1 D cache: 32K [4302727.625923] CPU: L2 cache: 4096K [4302727.625923] CPU: Physical Processor ID: 3 [4302727.625923] CPU: Processor Core ID: 1 [4302727.625923] x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106 [4302727.692484] CPU3: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz stepping 06 [4302727.694236] checking TSC synchronization [CPU#1 -> CPU#3]: passed. [4302727.824185] Switched to high resolution mode on CPU 3 [4302727.859184] kvm: enabling virtualization on CPU3 [4302727.859714] Sched Debug Version: v0.07, 2.6.26-rc2 #726 [4302727.859714] now at 6918576.148656 msecs [4302727.859714] .sysctl_sched_latency : 60.000000 [4302727.859714] .sysctl_sched_min_granularity : 12.000000 [4302727.859714] .sysctl_sched_wakeup_granularity : 30.000000 [4302727.859714] .sysctl_sched_child_runs_first : 0.000001 [4302727.860191] .sysctl_sched_features : 895 [4302727.860191] [4302727.860191] cpu#0, 2659.999 MHz [4302727.860191] .nr_running : 2 [4302727.860191] .load : 841 [4302727.860191] .nr_switches : 3427530 [4302727.861205] .nr_load_updates : 2183358 [4302727.861205] .nr_uninterruptible : 15 [4302727.861205] .jiffies : 4301585875 [4302727.861205] .next_balance : 4301.585696 [4302727.861205] .curr->pid : 4678 [4302727.861205] .clock : 6918579.002757 [4302727.862216] .cpu_load[0] : 841 [4302727.862216] .cpu_load[1] : 841 [4302727.862216] .cpu_load[2] : 841 [4302727.862216] .cpu_load[3] : 841 [4302727.862216] .cpu_load[4] : 841 [4302727.862216] [4302727.862216] cfs_rq[0]: [4302727.867209] .exec_clock : 3970.569663 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178969.408050 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 6 [4302727.867209] .shares : 1024 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 14.588517 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178971.405628 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 1024 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 41.615870 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178973.403544 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 2 [4302727.867209] .shares : 1024 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 0.000000 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178975.401320 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 0.000001 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 5178977.398314 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[0]: [4302727.867209] .exec_clock : 2165242.484786 [4302727.867209] .MIN_vruntime : 10323214.742376 [4302727.867209] .min_vruntime : 5178979.396488 [4302727.867209] .max_vruntime : 10323214.742376 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : 0.000000 [4302727.867209] .nr_running : 2 [4302727.867209] .load : 2048 [4302727.867209] .bkl_count : 513 [4302727.867209] .nr_spread_over : 1789825 [4302727.867209] .shares : 843 [4302727.867209] [4302727.867209] runnable tasks: [4302727.867209] task PID tree-key switches prio exec-runtime sum-exec sum-sleep [4302727.867209] ---------------------------------------------------------------------------------------------------------- [4302727.867209] Rqemu-system-x86 4678 10323337.578253 553310 120 10323337.578255 1380505.796830 42439.250368 [4302727.867209] [4302727.867209] cpu#1, 2659.999 MHz [4302727.867209] .nr_running : 3 [4302727.867209] .load : 415 [4302727.867209] .nr_switches : 629498 [4302727.867209] .nr_load_updates : 838874 [4302727.867209] .nr_uninterruptible : -6 [4302727.867209] .jiffies : 4301585895 [4302727.867209] .next_balance : 4301.585634 [4302727.867209] .curr->pid : 7799 [4302727.867209] .clock : 6918576.130865 [4302727.867209] .cpu_load[0] : 415 [4302727.867209] .cpu_load[1] : 415 [4302727.867209] .cpu_load[2] : 415 [4302727.867209] .cpu_load[3] : 415 [4302727.867209] .cpu_load[4] : 415 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 74.637431 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419588.520858 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 1 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 22.707771 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419590.518446 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 1 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 0.033026 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419590.518446 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 0.000000 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419590.518446 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867209] cfs_rq[1]: [4302727.867209] .exec_clock : 0.026450 [4302727.867209] .MIN_vruntime : 0.000001 [4302727.867209] .min_vruntime : 759396.868495 [4302727.867209] .max_vruntime : 0.000001 [4302727.867209] .spread : 0.000000 [4302727.867209] .spread0 : -4419590.518446 [4302727.867209] .nr_running : 0 [4302727.867209] .load : 0 [4302727.867209] .bkl_count : 299 [4302727.867209] .nr_spread_over : 0 [4302727.867209] .shares : 0 [4302727.867209] [4302727.867210] cfs_rq[1]: [4302727.867210] .exec_clock : 754981.092689 [4302727.867210] .MIN_vruntime : 1239813.449102 [4302727.867210] .min_vruntime : 759396.868495 [4302727.867210] .max_vruntime : 1239819.334711 [4302727.867210] .spread : 5.885609 [4302727.867210] .spread0 : -4419590.518446 [4302727.867210] .nr_running : 3 [4302727.867210] .load : 3072 [4302727.867210] .bkl_count : 299 [4302727.867210] .nr_spread_over : 53817 [4302727.867210] .shares : 415 [4302727.867210] [4302727.867210] runnable tasks: [4302727.867210] task PID tree-key switches prio exec-runtime sum-exec sum-sleep [4302727.867210] ---------------------------------------------------------------------------------------------------------- [4302727.900184] qemu-system-x86 4987 1239813.449102 709828 120 1239813.449102 1410504.949783 22865.206608 [4302727.900184] qemu-system-x86 5052 1239819.334711 530481 120 1239819.334711 1365146.519564 50937.064744 [4302727.900184] Rtoggle-processo 7799 1239811.208673 47886 120 1239811.208673 57552.631854 1592798.974913 [4302727.900184] [4302727.900184] cpu#3, 2659.999 MHz [4302727.900184] .nr_running : 1 [4302727.900184] .load : 285 [4302727.900184] .nr_switches : 611209 [4302727.900184] .nr_load_updates : 843051 [4302727.900184] .nr_uninterruptible : -2 [4302727.900184] .jiffies : 4301585916 [4302727.900184] .next_balance : 4301.586873 [4302727.900184] .curr->pid : 0 [4302727.900184] .clock : 6918576.376068 [4302727.900184] .cpu_load[0] : 0 [4302727.900184] .cpu_load[1] : 0 [4302727.900184] .cpu_load[2] : 0 [4302727.900184] .cpu_load[3] : 181 [4302727.900184] .cpu_load[4] : 1108 [4302727.900184] [4302727.900184] cfs_rq[3]: [4302727.900184] .exec_clock : 8.224765 [4302727.900184] BUG: spinlock recursion on CPU#3, swapper/0 [4302727.900184] lock: ffff81000103df00, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 3 [4302727.900184] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #726 [4302727.900184] [4302727.900184] Call Trace: [4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9 [4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123 [4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37 [4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a [4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c [4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf [4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86 [4302727.900184] [<ffffffff804373f6>] schedule+0x3fc/0x6ab [4302727.900184] [<ffffffff8024e03f>] ? ktime_get_ts+0x49/0x4e [4302727.900184] [<ffffffff80253a28>] ? tick_nohz_stop_idle+0x2d/0x54 [4302727.900184] [<ffffffff8021283f>] ? mwait_idle+0x0/0x59 [4302727.900184] [<ffffffff8020ae37>] cpu_idle+0xc8/0xd7 [4302727.900184] [<ffffffff804332f1>] start_secondary+0x173/0x178 [4302727.900184] -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-14 12:30 ` Avi Kivity @ 2008-05-14 13:05 ` Dmitry Adamushko 2008-05-15 10:19 ` Avi Kivity 0 siblings, 1 reply; 12+ messages in thread From: Dmitry Adamushko @ 2008-05-14 13:05 UTC (permalink / raw) To: Avi Kivity Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 2368 bytes --] 2008/5/14 Avi Kivity <avi@qumranet.com>: > [ ... ] > > [4302727.900184] Call Trace: > [4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9 > [4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123 > [4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37 > [4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a > [4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c > [4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf > [4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86 Err... sorry for the broken patch. The patch below on top of the previous one should address this issue (ugly, but should be ok for debugging). 'tasklist_lock' shouldn't cause a double lock, I guess. Sorry for rather 'blind' attempts. If no, then I'll prepare/test/take-a-closer-look at it later today when I'm at home. TIA, ------ kernel/sched_debug-prev.c 2008-05-14 14:53:28.000000000 +0200 +++ kernel/sched_debug.c 2008-05-14 14:58:12.000000000 +0200 @@ -125,6 +125,7 @@ void print_cfs_rq(struct seq_file *m, in char path[128] = ""; struct cgroup *cgroup = NULL; struct task_group *tg = cfs_rq->tg; + int was_locked; if (tg) cgroup = tg->css.cgroup; @@ -138,7 +139,11 @@ void print_cfs_rq(struct seq_file *m, in SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "exec_clock", SPLIT_NS(cfs_rq->exec_clock)); - spin_lock_irqsave(&rq->lock, flags); + was_locked = spin_is_locked(&rq->lock); + + if (!was_locked) + spin_lock_irqsave(&rq->lock, flags); + if (cfs_rq->rb_leftmost) MIN_vruntime = (__pick_next_entity(cfs_rq))->vruntime; last = __pick_last_entity(cfs_rq); @@ -146,7 +151,10 @@ void print_cfs_rq(struct seq_file *m, in max_vruntime = last->vruntime; min_vruntime = rq->cfs.min_vruntime; rq0_min_vruntime = per_cpu(runqueues, 0).cfs.min_vruntime; - spin_unlock_irqrestore(&rq->lock, flags); + + if (!was_locked) + spin_unlock_irqrestore(&rq->lock, flags); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "MIN_vruntime", SPLIT_NS(MIN_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime", --- -- Best regards, Dmitry Adamushko [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: is_locked.diff --] [-- Type: text/x-patch; name=is_locked.diff, Size: 1211 bytes --] --- kernel/sched_debug-prev.c 2008-05-14 14:53:28.000000000 +0200 +++ kernel/sched_debug.c 2008-05-14 14:58:12.000000000 +0200 @@ -125,6 +125,7 @@ void print_cfs_rq(struct seq_file *m, in char path[128] = ""; struct cgroup *cgroup = NULL; struct task_group *tg = cfs_rq->tg; + int was_locked; if (tg) cgroup = tg->css.cgroup; @@ -138,7 +139,11 @@ void print_cfs_rq(struct seq_file *m, in SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "exec_clock", SPLIT_NS(cfs_rq->exec_clock)); - spin_lock_irqsave(&rq->lock, flags); + was_locked = spin_is_locked(&rq->lock); + + if (!was_locked) + spin_lock_irqsave(&rq->lock, flags); + if (cfs_rq->rb_leftmost) MIN_vruntime = (__pick_next_entity(cfs_rq))->vruntime; last = __pick_last_entity(cfs_rq); @@ -146,7 +151,10 @@ void print_cfs_rq(struct seq_file *m, in max_vruntime = last->vruntime; min_vruntime = rq->cfs.min_vruntime; rq0_min_vruntime = per_cpu(runqueues, 0).cfs.min_vruntime; - spin_unlock_irqrestore(&rq->lock, flags); + + if (!was_locked) + spin_unlock_irqrestore(&rq->lock, flags); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "MIN_vruntime", SPLIT_NS(MIN_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime", ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-14 13:05 ` Dmitry Adamushko @ 2008-05-15 10:19 ` Avi Kivity 2008-05-21 12:31 ` Heiko Carstens 0 siblings, 1 reply; 12+ messages in thread From: Avi Kivity @ 2008-05-15 10:19 UTC (permalink / raw) To: Dmitry Adamushko Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Peter Zijlstra, Srivatsa Vaddagiri Dmitry Adamushko wrote: > 2008/5/14 Avi Kivity <avi@qumranet.com>: > > >> [ ... ] >> >> [4302727.900184] Call Trace: >> [4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9 >> [4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123 >> [4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37 >> [4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a >> [4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c >> [4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf >> [4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86 >> > > > Err... sorry for the broken patch. The patch below on top of the > previous one should address this issue (ugly, but should be ok for > debugging). > 'tasklist_lock' shouldn't cause a double lock, I guess. > > Sorry for rather 'blind' attempts. If no, then I'll > prepare/test/take-a-closer-look at it later today when I'm at home. > > Now it died when allocating the pda: [4301986.526627] Booting processor 1/6 ip 6000 [4301986.537195] swapper: page allocation failure. order:2, mode:0x20 [4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728 [4301986.537195] [4301986.537195] Call Trace: [4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb [4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f [4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd [4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46 [4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac [4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a [4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47 [4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad [4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178 [4301986.537195] [4301986.537195] Mem-info: [4301986.537195] DMA per-cpu: [4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0 [4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0 [4301986.537195] DMA32 per-cpu: [4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153 [4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135 [4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0 unstable:0 [4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0 [4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no [4301986.537195] lowmem_reserve[]: 0 1999 1999 1999 [4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0 all_unreclaimable? no [4301986.537195] lowmem_reserve[]: 0 0 0 0 [4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB 1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB [4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB [4301986.537195] 212750 total pagecache pages [4301986.537195] Swap cache: add 73, delete 18, find 0/0 [4301986.537195] Free swap = 3210964kB [4301986.537195] Total swap = 3211256kB There seem to be some order-2 allocations available, though. And in any case, cpu hotplug should't crash on such failures, only fail. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-15 10:19 ` Avi Kivity @ 2008-05-21 12:31 ` Heiko Carstens 2008-05-21 12:42 ` Avi Kivity 0 siblings, 1 reply; 12+ messages in thread From: Heiko Carstens @ 2008-05-21 12:31 UTC (permalink / raw) To: Avi Kivity Cc: Dmitry Adamushko, linux-kernel, Ingo Molnar, Peter Zijlstra, Srivatsa Vaddagiri > Now it died when allocating the pda: > > [4301986.526627] Booting processor 1/6 ip 6000 > [4301986.537195] swapper: page allocation failure. order:2, mode:0x20 > [4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728 > [4301986.537195] > [4301986.537195] Call Trace: > [4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb > [4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f > [4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd > [4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46 > [4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac > [4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a > [4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47 > [4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad > [4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178 > [4301986.537195] > [4301986.537195] Mem-info: > [4301986.537195] DMA per-cpu: > [4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0 > [4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0 > [4301986.537195] DMA32 per-cpu: > [4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153 > [4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135 > [4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0 > unstable:0 > [4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0 > [4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB > inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no > [4301986.537195] lowmem_reserve[]: 0 1999 1999 1999 > [4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB > active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0 > all_unreclaimable? no > [4301986.537195] lowmem_reserve[]: 0 0 0 0 > [4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB > 1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB > [4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB > 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB > [4301986.537195] 212750 total pagecache pages > [4301986.537195] Swap cache: add 73, delete 18, find 0/0 > [4301986.537195] Free swap = 3210964kB > [4301986.537195] Total swap = 3211256kB > > There seem to be some order-2 allocations available, though. And in any > case, cpu hotplug should't crash on such failures, only fail. Hi Avi, I think you fixed this specific bug. Does cpu hotplug stresstest now work for you? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-21 12:31 ` Heiko Carstens @ 2008-05-21 12:42 ` Avi Kivity 2008-05-21 12:55 ` Heiko Carstens 0 siblings, 1 reply; 12+ messages in thread From: Avi Kivity @ 2008-05-21 12:42 UTC (permalink / raw) To: Heiko Carstens Cc: Dmitry Adamushko, linux-kernel, Ingo Molnar, Peter Zijlstra, Srivatsa Vaddagiri Heiko Carstens wrote: >> Now it died when allocating the pda: >> >> [4301986.526627] Booting processor 1/6 ip 6000 >> [4301986.537195] swapper: page allocation failure. order:2, mode:0x20 >> [4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728 >> [4301986.537195] >> [4301986.537195] Call Trace: >> [4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb >> [4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f >> [4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd >> [4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46 >> [4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac >> [4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a >> [4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47 >> [4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad >> [4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178 >> [4301986.537195] >> [4301986.537195] Mem-info: >> [4301986.537195] DMA per-cpu: >> [4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0 >> [4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0 >> [4301986.537195] DMA32 per-cpu: >> [4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153 >> [4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135 >> [4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0 >> unstable:0 >> [4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0 >> [4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB >> inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no >> [4301986.537195] lowmem_reserve[]: 0 1999 1999 1999 >> [4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB >> active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0 >> all_unreclaimable? no >> [4301986.537195] lowmem_reserve[]: 0 0 0 0 >> [4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB >> 1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB >> [4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB >> 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB >> [4301986.537195] 212750 total pagecache pages >> [4301986.537195] Swap cache: add 73, delete 18, find 0/0 >> [4301986.537195] Free swap = 3210964kB >> [4301986.537195] Total swap = 3211256kB >> >> There seem to be some order-2 allocations available, though. And in any >> case, cpu hotplug should't crash on such failures, only fail. >> > > Hi Avi, > > I think you fixed this specific bug. Does cpu hotplug stresstest now work > for you? > Which bug, the order-2 allocs? I certainly didn't fix it. Last I tried, cpu hotplug still failed, but it was some time ago as patches flow. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-21 12:42 ` Avi Kivity @ 2008-05-21 12:55 ` Heiko Carstens 2008-05-21 13:03 ` Avi Kivity 0 siblings, 1 reply; 12+ messages in thread From: Heiko Carstens @ 2008-05-21 12:55 UTC (permalink / raw) To: Avi Kivity Cc: Dmitry Adamushko, linux-kernel, Ingo Molnar, Peter Zijlstra, Srivatsa Vaddagiri On Wed, May 21, 2008 at 03:42:51PM +0300, Avi Kivity wrote: > Heiko Carstens wrote: >> I think you fixed this specific bug. Does cpu hotplug stresstest now work >> for you? > > Which bug, the order-2 allocs? I certainly didn't fix it. > > Last I tried, cpu hotplug still failed, but it was some time ago as > patches flow. Ah no. I thought "x86: fix crash on cpu hotplug on pat-incapable machines" would have fixed the cpu hotplug bug for you. I was actually talking of the pick_next_task_fair incarnation of one of the cpu hotplug bugs. s390 backtrace on 2.6.25 looks like this. <1>Unable to handle kernel pointer dereference at virtual kernel address 00000200000db000 <4>Oops: 003b [#1] PREEMPT SMP <4>Modules linked in: dm_multipath sunrpc qeth_l2 dm_mod cu3088 qeth ccwgroup <4>CPU: 0 Not tainted 2.6.25-27.x.20080422-s390xdefault #1 <4>Process migration/0 (pid: 5806, task: 00000000aac3a838, ksp: 000000017a2f3dd0) <4>Krnl PSW : 0400200180000000 0000000000125f48 (pick_next_task_fair+0x34/0xd4) <4> R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 EA:3 <4>Krnl GPRS: 0000079e029c6ecb 0000000000000000 000000008548ea60 000000008548d700 <4> 00000000004526f8 0000000000000597 0000000000000000 0000000000000000 <4> 0000000000672a00 0000000000000000 00000000aac3a838 000000017a2f3d10 <4> 00000200000db85a 00000000004672a0 000000017a2f3d50 000000017a2f3d10 <4>Krnl Code: 0000000000125f38: e3e0f0980024 stg %r14,152(%r15) <4> 0000000000125f3e: d507d000c010 clc 0(8,%r13),16(%r12) <4> 0000000000125f44: a784004e brc 8,125fe0 <4> >0000000000125f48: d507d000c030 clc 0(8,%r13),48(%r12) <4> 0000000000125f4e: b904002c lgr %r2,%r12 <4> 0000000000125f52: a7990000 lghi %r9,0 <4> 0000000000125f56: a7840033 brc 8,125fbc <4> 0000000000125f5a: c0e5ffffee23 brasl %r14,123ba0 <4>Call Trace: <4>([<00000000aac3a838>] 0xaac3a838) <4> [<000000000045270c>] schedule+0x178/0x978 <4> [<000000000012fe86>] migration_thread+0x1aa/0x30c <4> [<0000000000152ad8>] kthread+0x68/0xa0 <4> [<000000000010a0be>] kernel_thread_starter+0x6/0xc <4> [<000000000010a0b8>] kernel_thread_starter+0x0/0xc <4>Last Breaking-Event-Address: <4> [<0000000000125fc6>] pick_next_task_fair+0xb2/0xd4 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] cpu hotplug vs scheduler 2008-05-21 12:55 ` Heiko Carstens @ 2008-05-21 13:03 ` Avi Kivity 0 siblings, 0 replies; 12+ messages in thread From: Avi Kivity @ 2008-05-21 13:03 UTC (permalink / raw) To: Heiko Carstens Cc: Dmitry Adamushko, linux-kernel, Ingo Molnar, Peter Zijlstra, Srivatsa Vaddagiri Heiko Carstens wrote: > On Wed, May 21, 2008 at 03:42:51PM +0300, Avi Kivity wrote: > >> Heiko Carstens wrote: >> >>> I think you fixed this specific bug. Does cpu hotplug stresstest now work >>> for you? >>> >> Which bug, the order-2 allocs? I certainly didn't fix it. >> >> Last I tried, cpu hotplug still failed, but it was some time ago as >> patches flow. >> > > Ah no. I thought "x86: fix crash on cpu hotplug on pat-incapable machines" > would have fixed the cpu hotplug bug for you. > > No, that only fixes an immediate oops when running in a virtual machine (which doesn't have pat). > I was actually talking of the pick_next_task_fair incarnation of one of > the cpu hotplug bugs. > > That hasn't been addressed, AFAIK. > s390 backtrace on 2.6.25 looks like this. > > <1>Unable to handle kernel pointer dereference at virtual kernel address 00000200000db000 > <4>Oops: 003b [#1] PREEMPT SMP > <4>Modules linked in: dm_multipath sunrpc qeth_l2 dm_mod cu3088 qeth ccwgroup > <4>CPU: 0 Not tainted 2.6.25-27.x.20080422-s390xdefault #1 > <4>Process migration/0 (pid: 5806, task: 00000000aac3a838, ksp: 000000017a2f3dd0) > <4>Krnl PSW : 0400200180000000 0000000000125f48 (pick_next_task_fair+0x34/0xd4) > <4> R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > <4>Krnl GPRS: 0000079e029c6ecb 0000000000000000 000000008548ea60 000000008548d700 > <4> 00000000004526f8 0000000000000597 0000000000000000 0000000000000000 > <4> 0000000000672a00 0000000000000000 00000000aac3a838 000000017a2f3d10 > <4> 00000200000db85a 00000000004672a0 000000017a2f3d50 000000017a2f3d10 > <4>Krnl Code: 0000000000125f38: e3e0f0980024 stg %r14,152(%r15) > <4> 0000000000125f3e: d507d000c010 clc 0(8,%r13),16(%r12) > <4> 0000000000125f44: a784004e brc 8,125fe0 > <4> >0000000000125f48: d507d000c030 clc 0(8,%r13),48(%r12) > <4> 0000000000125f4e: b904002c lgr %r2,%r12 > <4> 0000000000125f52: a7990000 lghi %r9,0 > <4> 0000000000125f56: a7840033 brc 8,125fbc > <4> 0000000000125f5a: c0e5ffffee23 brasl %r14,123ba0 > <4>Call Trace: > <4>([<00000000aac3a838>] 0xaac3a838) > <4> [<000000000045270c>] schedule+0x178/0x978 > <4> [<000000000012fe86>] migration_thread+0x1aa/0x30c > <4> [<0000000000152ad8>] kthread+0x68/0xa0 > <4> [<000000000010a0be>] kernel_thread_starter+0x6/0xc > <4> [<000000000010a0b8>] kernel_thread_starter+0x0/0xc > <4>Last Breaking-Event-Address: > <4> [<0000000000125fc6>] pick_next_task_fair+0xb2/0xd4 > Seems to be the same bug, yes. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 12+ messages in thread
* [BUG] hotplug cpus on ia64 2008-05-14 8:13 ` Dmitry Adamushko 2008-05-14 12:30 ` Avi Kivity @ 2008-05-21 14:48 ` Cliff Wickman 1 sibling, 0 replies; 12+ messages in thread From: Cliff Wickman @ 2008-05-21 14:48 UTC (permalink / raw) To: dmitry.adamushko, avi, mingo, heiko.carstens, a.p.zijlstra, vatsa Cc: Avi Kivity, linux-kernel, Ingo Molnar, Heiko Carstens, PeterZijlst Gentlemen, I built an ia64 kernel from Andrew's tree (2.6.26-rc2-mm1) and get a very predictable hotplug cpu problem. billberry1:/tmp/cpw # ./dis disabled cpu 17 enabled cpu 17 billberry1:/tmp/cpw # ./dis disabled cpu 17 enabled cpu 17 billberry1:/tmp/cpw # ./dis The script that disables the cpu always hangs (unkillable) on the 3rd attempt. I haven't spent any debugging time on it yet. Just wondering if you've seen it? It doesn't seem to happen x86_64. -Cliff Wickman cpw@sgi.com ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-05-21 14:45 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-13 14:33 [BUG] cpu hotplug vs scheduler Avi Kivity 2008-05-13 15:33 ` Avi Kivity 2008-05-13 19:00 ` Heiko Carstens 2008-05-14 8:13 ` Dmitry Adamushko 2008-05-14 12:30 ` Avi Kivity 2008-05-14 13:05 ` Dmitry Adamushko 2008-05-15 10:19 ` Avi Kivity 2008-05-21 12:31 ` Heiko Carstens 2008-05-21 12:42 ` Avi Kivity 2008-05-21 12:55 ` Heiko Carstens 2008-05-21 13:03 ` Avi Kivity 2008-05-21 14:48 ` [BUG] hotplug cpus on ia64 Cliff Wickman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.