From: Avi Kivity <avi@qumranet.com>
To: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Subject: Re: [BUG] cpu hotplug vs scheduler
Date: Wed, 14 May 2008 15:30:33 +0300 [thread overview]
Message-ID: <482ADB69.8010305@qumranet.com> (raw)
In-Reply-To: <b647ffbd0805140113i4296076cna3f371c86c479653@mail.gmail.com>
Dmitry Adamushko wrote:
> Hi,
>
>
>> [ ... ]
>>
>> [4298303.713901] Call Trace:
>> [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab
>> [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8
>> [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf
>> [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d
>> [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d
>> [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77
>> [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12
>> [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77
>> [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12
>> [4298303.713901]
>> [4298303.713901]
>> [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c
>> 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff
>> ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c
>> [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c
>>
>> This seems to be the assignment to cfs_rq after pick_next_entity().
>>
>
> [ cc'ed a few folks. ]
>
>
> So the cfs-tree likely gets out-of-sync. I pressume, it won't be
> reproducible with CONFIG_SCHED_GROUP options being disabled.
>
> Anyway, would you try one of these debug-patches (not sure about the
> workability of the second one though :-/)
>
> Let's check what are the values for 'cfs_rq->weight.load/nr_running'.
>
>
Got this for the first patch:
[4302727.615522] Booting processor 3/7 ip 6000
[4302727.625923] Initializing CPU#3
[4302727.625923] Calibrating delay using timer specific routine..
5319.76 BogoMIPS (lpj=2659883)
[4302727.625923] CPU: L1 I cache: 32K, L1 D cache: 32K
[4302727.625923] CPU: L2 cache: 4096K
[4302727.625923] CPU: Physical Processor ID: 3
[4302727.625923] CPU: Processor Core ID: 1
[4302727.625923] x86 PAT enabled: cpu 3, old 0x7040600070406, new
0x7010600070106
[4302727.692484] CPU3: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
stepping 06
[4302727.694236] checking TSC synchronization [CPU#1 -> CPU#3]: passed.
[4302727.824185] Switched to high resolution mode on CPU 3
[4302727.859184] kvm: enabling virtualization on CPU3
[4302727.859714] Sched Debug Version: v0.07, 2.6.26-rc2 #726
[4302727.859714] now at 6918576.148656 msecs
[4302727.859714] .sysctl_sched_latency : 60.000000
[4302727.859714] .sysctl_sched_min_granularity : 12.000000
[4302727.859714] .sysctl_sched_wakeup_granularity : 30.000000
[4302727.859714] .sysctl_sched_child_runs_first : 0.000001
[4302727.860191] .sysctl_sched_features : 895
[4302727.860191]
[4302727.860191] cpu#0, 2659.999 MHz
[4302727.860191] .nr_running : 2
[4302727.860191] .load : 841
[4302727.860191] .nr_switches : 3427530
[4302727.861205] .nr_load_updates : 2183358
[4302727.861205] .nr_uninterruptible : 15
[4302727.861205] .jiffies : 4301585875
[4302727.861205] .next_balance : 4301.585696
[4302727.861205] .curr->pid : 4678
[4302727.861205] .clock : 6918579.002757
[4302727.862216] .cpu_load[0] : 841
[4302727.862216] .cpu_load[1] : 841
[4302727.862216] .cpu_load[2] : 841
[4302727.862216] .cpu_load[3] : 841
[4302727.862216] .cpu_load[4] : 841
[4302727.862216]
[4302727.862216] cfs_rq[0]:
[4302727.867209] .exec_clock : 3970.569663
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178969.408050
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 6
[4302727.867209] .shares : 1024
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 14.588517
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178971.405628
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 1024
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 41.615870
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178973.403544
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 2
[4302727.867209] .shares : 1024
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 0.000000
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178975.401320
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 0.000001
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178977.398314
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 2165242.484786
[4302727.867209] .MIN_vruntime : 10323214.742376
[4302727.867209] .min_vruntime : 5178979.396488
[4302727.867209] .max_vruntime : 10323214.742376
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 2
[4302727.867209] .load : 2048
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 1789825
[4302727.867209] .shares : 843
[4302727.867209]
[4302727.867209] runnable tasks:
[4302727.867209] task PID tree-key switches
prio exec-runtime sum-exec sum-sleep
[4302727.867209]
----------------------------------------------------------------------------------------------------------
[4302727.867209] Rqemu-system-x86 4678 10323337.578253 553310
120 10323337.578255 1380505.796830 42439.250368
[4302727.867209]
[4302727.867209] cpu#1, 2659.999 MHz
[4302727.867209] .nr_running : 3
[4302727.867209] .load : 415
[4302727.867209] .nr_switches : 629498
[4302727.867209] .nr_load_updates : 838874
[4302727.867209] .nr_uninterruptible : -6
[4302727.867209] .jiffies : 4301585895
[4302727.867209] .next_balance : 4301.585634
[4302727.867209] .curr->pid : 7799
[4302727.867209] .clock : 6918576.130865
[4302727.867209] .cpu_load[0] : 415
[4302727.867209] .cpu_load[1] : 415
[4302727.867209] .cpu_load[2] : 415
[4302727.867209] .cpu_load[3] : 415
[4302727.867209] .cpu_load[4] : 415
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 74.637431
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419588.520858
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 1
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 22.707771
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419590.518446
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 1
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 0.033026
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419590.518446
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 0.000000
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419590.518446
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 0.026450
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419590.518446
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867210] cfs_rq[1]:
[4302727.867210] .exec_clock : 754981.092689
[4302727.867210] .MIN_vruntime : 1239813.449102
[4302727.867210] .min_vruntime : 759396.868495
[4302727.867210] .max_vruntime : 1239819.334711
[4302727.867210] .spread : 5.885609
[4302727.867210] .spread0 : -4419590.518446
[4302727.867210] .nr_running : 3
[4302727.867210] .load : 3072
[4302727.867210] .bkl_count : 299
[4302727.867210] .nr_spread_over : 53817
[4302727.867210] .shares : 415
[4302727.867210]
[4302727.867210] runnable tasks:
[4302727.867210] task PID tree-key switches
prio exec-runtime sum-exec sum-sleep
[4302727.867210]
----------------------------------------------------------------------------------------------------------
[4302727.900184] qemu-system-x86 4987 1239813.449102 709828
120 1239813.449102 1410504.949783 22865.206608
[4302727.900184] qemu-system-x86 5052 1239819.334711 530481
120 1239819.334711 1365146.519564 50937.064744
[4302727.900184] Rtoggle-processo 7799 1239811.208673 47886
120 1239811.208673 57552.631854 1592798.974913
[4302727.900184]
[4302727.900184] cpu#3, 2659.999 MHz
[4302727.900184] .nr_running : 1
[4302727.900184] .load : 285
[4302727.900184] .nr_switches : 611209
[4302727.900184] .nr_load_updates : 843051
[4302727.900184] .nr_uninterruptible : -2
[4302727.900184] .jiffies : 4301585916
[4302727.900184] .next_balance : 4301.586873
[4302727.900184] .curr->pid : 0
[4302727.900184] .clock : 6918576.376068
[4302727.900184] .cpu_load[0] : 0
[4302727.900184] .cpu_load[1] : 0
[4302727.900184] .cpu_load[2] : 0
[4302727.900184] .cpu_load[3] : 181
[4302727.900184] .cpu_load[4] : 1108
[4302727.900184]
[4302727.900184] cfs_rq[3]:
[4302727.900184] .exec_clock : 8.224765
[4302727.900184] BUG: spinlock recursion on CPU#3, swapper/0
[4302727.900184] lock: ffff81000103df00, .magic: dead4ead, .owner:
swapper/0, .owner_cpu: 3
[4302727.900184] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #726
[4302727.900184]
[4302727.900184] Call Trace:
[4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9
[4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123
[4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37
[4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a
[4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c
[4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf
[4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86
[4302727.900184] [<ffffffff804373f6>] schedule+0x3fc/0x6ab
[4302727.900184] [<ffffffff8024e03f>] ? ktime_get_ts+0x49/0x4e
[4302727.900184] [<ffffffff80253a28>] ? tick_nohz_stop_idle+0x2d/0x54
[4302727.900184] [<ffffffff8021283f>] ? mwait_idle+0x0/0x59
[4302727.900184] [<ffffffff8020ae37>] cpu_idle+0xc8/0xd7
[4302727.900184] [<ffffffff804332f1>] start_secondary+0x173/0x178
[4302727.900184]
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2008-05-14 12:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-13 14:33 [BUG] cpu hotplug vs scheduler Avi Kivity
2008-05-13 15:33 ` Avi Kivity
2008-05-13 19:00 ` Heiko Carstens
2008-05-14 8:13 ` Dmitry Adamushko
2008-05-14 12:30 ` Avi Kivity [this message]
2008-05-14 13:05 ` Dmitry Adamushko
2008-05-15 10:19 ` Avi Kivity
2008-05-21 12:31 ` Heiko Carstens
2008-05-21 12:42 ` Avi Kivity
2008-05-21 12:55 ` Heiko Carstens
2008-05-21 13:03 ` Avi Kivity
2008-05-21 14:48 ` [BUG] hotplug cpus on ia64 Cliff Wickman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=482ADB69.8010305@qumranet.com \
--to=avi@qumranet.com \
--cc=a.p.zijlstra@chello.nl \
--cc=dmitry.adamushko@gmail.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.