* kernel BUG at kernel/sched_rt.c:322! @ 2008-10-09 1:14 Paul E. McKenney 2008-10-09 2:45 ` Steven Noonan 2008-10-09 5:06 ` Peter Zijlstra 0 siblings, 2 replies; 7+ messages in thread From: Paul E. McKenney @ 2008-10-09 1:14 UTC (permalink / raw) To: rjw; +Cc: linux-kernel When I enable: CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_USER_SCHED=y and run a bash script onlining and offlining CPUs in an infinite loop on x86 using 2.6.27-rc9, after about 1.5 hours I get the following. On the off-chance that this is new news... Thanx, Paul [ 5538.091011] kernel BUG at kernel/sched_rt.c:322! [ 5538.091011] invalid opcode: 0000 [#1] SMP [ 5538.091011] Modules linked in: [ 5538.091011] [ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1) [ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7 [ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0 [ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060 [ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8 [ 5538.091011] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000) [ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 [ 5538.091011] c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 [ 5538.091011] c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 [ 5538.091011] Call Trace: [ 5538.091011] [<c011d151>] ? rq_offline_rt+0x21/0x60 [ 5538.091011] [<c011aedb>] ? set_rq_offline+0x2b/0x50 [ 5538.091011] [<c011f967>] ? rq_attach_root+0xa7/0xb0 [ 5538.091011] [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490 [ 5538.091011] [<c013dfc1>] ? sched_clock_cpu+0x121/0x170 [ 5538.091011] [<c011b56e>] ? update_curr+0x4e/0x80 [ 5538.091011] [<c013ca8f>] ? hrtimer_run_pending+0x1f/0x90 [ 5538.091011] [<c013c350>] ? enqueue_hrtimer+0x60/0x80 [ 5538.091011] [<c011bc47>] ? __enqueue_entity+0xc7/0x100 [ 5538.091011] [<c01223de>] ? partition_sched_domains+0x1ae/0x220 [ 5538.091011] [<c012008f>] ? wake_up_process+0xf/0x20 [ 5538.091011] [<c0122476>] ? update_sched_domains+0x26/0x40 [ 5538.091011] [<c0374907>] ? notifier_call_chain+0x37/0x80 [ 5538.091011] [<c013d379>] ? __raw_notifier_call_chain+0x19/0x20 [ 5538.091011] [<c013d39a>] ? raw_notifier_call_chain+0x1a/0x20 [ 5538.091011] [<c036e9cf>] ? _cpu_up+0xaf/0x100 [ 5538.091011] [<c037125e>] ? mutex_lock+0xe/0x20 [ 5538.091011] [<c036ea69>] ? cpu_up+0x49/0x70 [ 5538.091011] [<c03619d8>] ? store_online+0x58/0x80 [ 5538.091011] [<c0361980>] ? store_online+0x0/0x80 [ 5538.091011] [<c0266f1c>] ? sysdev_store+0x2c/0x40 [ 5538.091011] [<c01b728d>] ? sysfs_write_file+0x9d/0x100 [ 5538.091011] [<c0175129>] ? vfs_write+0x99/0x130 [ 5538.091011] [<c01b71f0>] ? sysfs_write_file+0x0/0x100 [ 5538.091011] [<c017566d>] ? sys_write+0x3d/0x70 [ 5538.091011] [<c010318e>] ? syscall_call+0x7/0xb [ 5538.091011] [<c0370000>] ? acpi_processor_start+0x630/0x63f [ 5538.091011] ======================= [ 5538.091011] Code: 87 72 ff ff ff 29 b3 4c 03 00 00 19 bb 50 03 00 00 31 f6 31 ff eb 9a 09 fe 0f 95 c0 0f b6 d8 8b 45 dc e8 6d 5f 25 00 85 db 74 a4 <0f> 0b eb fe 90 8d 74 26 00 55 89 d0 89 e5 83 ec 08 83 fa 16 89 [ 5538.091011] EIP: [<c011c287>] __disable_runtime+0x1c7/0x1d0 SS:ESP 0068:f6df7ca8 [ 5538.091011] ---[ end trace 5b3bf11f31634d39 ]--- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel BUG at kernel/sched_rt.c:322! 2008-10-09 1:14 kernel BUG at kernel/sched_rt.c:322! Paul E. McKenney @ 2008-10-09 2:45 ` Steven Noonan 2008-10-09 2:57 ` Paul E. McKenney 2008-10-09 5:06 ` Peter Zijlstra 1 sibling, 1 reply; 7+ messages in thread From: Steven Noonan @ 2008-10-09 2:45 UTC (permalink / raw) To: paulmck; +Cc: rjw, linux-kernel On Wed, Oct 8, 2008 at 6:14 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > When I enable: > > CONFIG_GROUP_SCHED=y > CONFIG_FAIR_GROUP_SCHED=y > CONFIG_USER_SCHED=y > > and run a bash script onlining and offlining CPUs in an infinite loop > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following. Is this a regression between 2.6.27-rc8 and -rc9, or did it crop up earlier? - Steven ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel BUG at kernel/sched_rt.c:322! 2008-10-09 2:45 ` Steven Noonan @ 2008-10-09 2:57 ` Paul E. McKenney 0 siblings, 0 replies; 7+ messages in thread From: Paul E. McKenney @ 2008-10-09 2:57 UTC (permalink / raw) To: Steven Noonan; +Cc: rjw, linux-kernel On Wed, Oct 08, 2008 at 07:45:38PM -0700, Steven Noonan wrote: > On Wed, Oct 8, 2008 at 6:14 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > When I enable: > > > > CONFIG_GROUP_SCHED=y > > CONFIG_FAIR_GROUP_SCHED=y > > CONFIG_USER_SCHED=y > > > > and run a bash script onlining and offlining CPUs in an infinite loop > > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following. > > Is this a regression between 2.6.27-rc8 and -rc9, or did it crop up earlier? Hello, Steve! Good question. I will run on older kernels. Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel BUG at kernel/sched_rt.c:322! 2008-10-09 1:14 kernel BUG at kernel/sched_rt.c:322! Paul E. McKenney 2008-10-09 2:45 ` Steven Noonan @ 2008-10-09 5:06 ` Peter Zijlstra 2008-10-09 12:31 ` Paul E. McKenney 1 sibling, 1 reply; 7+ messages in thread From: Peter Zijlstra @ 2008-10-09 5:06 UTC (permalink / raw) To: paulmck; +Cc: rjw, linux-kernel On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote: > When I enable: > > CONFIG_GROUP_SCHED=y > CONFIG_FAIR_GROUP_SCHED=y > CONFIG_USER_SCHED=y > > and run a bash script onlining and offlining CPUs in an infinite loop > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following. > > On the off-chance that this is new news... Hmm, yes. I thought I had all those fixed :-( > [ 5538.091011] kernel BUG at kernel/sched_rt.c:322! > [ 5538.091011] invalid opcode: 0000 [#1] SMP > [ 5538.091011] Modules linked in: > [ 5538.091011] > [ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1) > [ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7 > [ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0 > [ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060 > [ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8 > [ 5538.091011] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > [ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000) > [ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 > [ 5538.091011] c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 > [ 5538.091011] c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 > [ 5538.091011] Call Trace: > [ 5538.091011] [<c011d151>] ? rq_offline_rt+0x21/0x60 > [ 5538.091011] [<c011aedb>] ? set_rq_offline+0x2b/0x50 > [ 5538.091011] [<c011f967>] ? rq_attach_root+0xa7/0xb0 > [ 5538.091011] [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490 At the very least we're doing part of the offline process twice it seems, once through set_rq_offline()/set_rq_online() and once through disable_runtime()/enabled_runtime(). But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip cpus with RUNTIME_INF runtime that should be harmless. Modifications to rt_rq->rt_runtime are all done while holding rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime() and __disable_runtime() and __enable_runtime()). Which means its enough to hold either of those locks in order to get a stable reading of the value. Which leaves me puzzled for the moment... tip/master has the following commit to clarify the code somewhat: commit 78333cdd0e472180743d35988e576d6ecc6f6ddb Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Tue Sep 23 15:33:43 2008 +0200 sched: add some comments to the bandwidth code Hopefully clarify some of this code a little. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 2e228bd..d570a8c 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) #endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SMP +/* + * We ran out of runtime, see if we can borrow some from our neighbours. + */ static int do_balance_runtime(struct rt_rq *rt_rq) { struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); @@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq) continue; spin_lock(&iter->rt_runtime_lock); + /* + * Either all rqs have inf runtime and there's nothing to steal + * or __disable_runtime() below sets a specific rq to inf to + * indicate its been disabled and disalow stealing. + */ if (iter->rt_runtime == RUNTIME_INF) goto next; + /* + * From runqueues with spare time, take 1/n part of their + * spare time, but no more than our period. + */ diff = iter->rt_runtime - iter->rt_time; if (diff > 0) { diff = div_u64((u64)diff, weight); @@ -274,6 +286,9 @@ next: return more; } +/* + * Ensure this RQ takes back all the runtime it lend to its neighbours. + */ static void __disable_runtime(struct rq *rq) { struct root_domain *rd = rq->rd; @@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq) spin_lock(&rt_b->rt_runtime_lock); spin_lock(&rt_rq->rt_runtime_lock); + /* + * Either we're all inf and nobody needs to borrow, or we're + * already disabled and thus have nothing to do, or we have + * exactly the right amount of runtime to take out. + */ if (rt_rq->rt_runtime == RUNTIME_INF || rt_rq->rt_runtime == rt_b->rt_runtime) goto balanced; spin_unlock(&rt_rq->rt_runtime_lock); + /* + * Calculate the difference between what we started out with + * and what we current have, that's the amount of runtime + * we lend and now have to reclaim. + */ want = rt_b->rt_runtime - rt_rq->rt_runtime; + /* + * Greedy reclaim, take back as much as we can. + */ for_each_cpu_mask(i, rd->span) { struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i); s64 diff; + /* + * Can't reclaim from ourselves or disabled runqueues. + */ if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF) continue; @@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq) } spin_lock(&rt_rq->rt_runtime_lock); + /* + * We cannot be left wanting - that would mean some runtime + * leaked out of the system. + */ BUG_ON(want); balanced: + /* + * Disable all the borrow logic by pretending we have inf + * runtime - in which case borrowing doesn't make sense. + */ rt_rq->rt_runtime = RUNTIME_INF; spin_unlock(&rt_rq->rt_runtime_lock); spin_unlock(&rt_b->rt_runtime_lock); @@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq) if (unlikely(!scheduler_running)) return; + /* + * Reset each runqueue's bandwidth settings + */ for_each_leaf_rt_rq(rt_rq, rq) { struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: kernel BUG at kernel/sched_rt.c:322! 2008-10-09 5:06 ` Peter Zijlstra @ 2008-10-09 12:31 ` Paul E. McKenney 2008-10-10 1:54 ` Zhang, Yanmin 0 siblings, 1 reply; 7+ messages in thread From: Paul E. McKenney @ 2008-10-09 12:31 UTC (permalink / raw) To: Peter Zijlstra; +Cc: rjw, linux-kernel On Thu, Oct 09, 2008 at 07:06:38AM +0200, Peter Zijlstra wrote: > On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote: > > When I enable: > > > > CONFIG_GROUP_SCHED=y > > CONFIG_FAIR_GROUP_SCHED=y > > CONFIG_USER_SCHED=y > > > > and run a bash script onlining and offlining CPUs in an infinite loop > > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following. > > > > On the off-chance that this is new news... > > Hmm, yes. I thought I had all those fixed :-( I know that feeling!!! ;-) > > [ 5538.091011] kernel BUG at kernel/sched_rt.c:322! > > [ 5538.091011] invalid opcode: 0000 [#1] SMP > > [ 5538.091011] Modules linked in: > > [ 5538.091011] > > [ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1) > > [ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7 > > [ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0 > > [ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060 > > [ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8 > > [ 5538.091011] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > [ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000) > > [ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 > > [ 5538.091011] c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 > > [ 5538.091011] c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 > > [ 5538.091011] Call Trace: > > [ 5538.091011] [<c011d151>] ? rq_offline_rt+0x21/0x60 > > [ 5538.091011] [<c011aedb>] ? set_rq_offline+0x2b/0x50 > > [ 5538.091011] [<c011f967>] ? rq_attach_root+0xa7/0xb0 > > [ 5538.091011] [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490 > > At the very least we're doing part of the offline process twice it > seems, once through set_rq_offline()/set_rq_online() and once through > disable_runtime()/enabled_runtime(). > > But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip > cpus with RUNTIME_INF runtime that should be harmless. Would double-processing a non-offlined CPU cause trouble, perhaps setting the runtime to a nonsensical value? > Modifications to rt_rq->rt_runtime are all done while holding > rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime() > and __disable_runtime() and __enable_runtime()). Which means its enough > to hold either of those locks in order to get a stable reading of the > value. > > Which leaves me puzzled for the moment... I know that feeling as well... > tip/master has the following commit to clarify the code somewhat: > > > commit 78333cdd0e472180743d35988e576d6ecc6f6ddb > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Tue Sep 23 15:33:43 2008 +0200 > > sched: add some comments to the bandwidth code > > Hopefully clarify some of this code a little. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c > index 2e228bd..d570a8c 100644 > --- a/kernel/sched_rt.c > +++ b/kernel/sched_rt.c > @@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) > #endif /* CONFIG_RT_GROUP_SCHED */ > > #ifdef CONFIG_SMP > +/* > + * We ran out of runtime, see if we can borrow some from our neighbours. > + */ Suppose that all CPUs nearby have run out of runtime. Or is that possible? Thanx, Paul > static int do_balance_runtime(struct rt_rq *rt_rq) > { > struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); > @@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq) > continue; > > spin_lock(&iter->rt_runtime_lock); > + /* > + * Either all rqs have inf runtime and there's nothing to steal > + * or __disable_runtime() below sets a specific rq to inf to > + * indicate its been disabled and disalow stealing. > + */ > if (iter->rt_runtime == RUNTIME_INF) > goto next; > > + /* > + * From runqueues with spare time, take 1/n part of their > + * spare time, but no more than our period. > + */ > diff = iter->rt_runtime - iter->rt_time; > if (diff > 0) { > diff = div_u64((u64)diff, weight); > @@ -274,6 +286,9 @@ next: > return more; > } > > +/* > + * Ensure this RQ takes back all the runtime it lend to its neighbours. > + */ > static void __disable_runtime(struct rq *rq) > { > struct root_domain *rd = rq->rd; > @@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq) > > spin_lock(&rt_b->rt_runtime_lock); > spin_lock(&rt_rq->rt_runtime_lock); > + /* > + * Either we're all inf and nobody needs to borrow, or we're > + * already disabled and thus have nothing to do, or we have > + * exactly the right amount of runtime to take out. > + */ > if (rt_rq->rt_runtime == RUNTIME_INF || > rt_rq->rt_runtime == rt_b->rt_runtime) > goto balanced; > spin_unlock(&rt_rq->rt_runtime_lock); > > + /* > + * Calculate the difference between what we started out with > + * and what we current have, that's the amount of runtime > + * we lend and now have to reclaim. > + */ > want = rt_b->rt_runtime - rt_rq->rt_runtime; > > + /* > + * Greedy reclaim, take back as much as we can. > + */ > for_each_cpu_mask(i, rd->span) { > struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i); > s64 diff; > > + /* > + * Can't reclaim from ourselves or disabled runqueues. > + */ > if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF) > continue; > > @@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq) > } > > spin_lock(&rt_rq->rt_runtime_lock); > + /* > + * We cannot be left wanting - that would mean some runtime > + * leaked out of the system. > + */ > BUG_ON(want); > balanced: > + /* > + * Disable all the borrow logic by pretending we have inf > + * runtime - in which case borrowing doesn't make sense. > + */ > rt_rq->rt_runtime = RUNTIME_INF; > spin_unlock(&rt_rq->rt_runtime_lock); > spin_unlock(&rt_b->rt_runtime_lock); > @@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq) > if (unlikely(!scheduler_running)) > return; > > + /* > + * Reset each runqueue's bandwidth settings > + */ > for_each_leaf_rt_rq(rt_rq, rq) { > struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel BUG at kernel/sched_rt.c:322! 2008-10-09 12:31 ` Paul E. McKenney @ 2008-10-10 1:54 ` Zhang, Yanmin 2008-10-10 2:13 ` Paul E. McKenney 0 siblings, 1 reply; 7+ messages in thread From: Zhang, Yanmin @ 2008-10-10 1:54 UTC (permalink / raw) To: paulmck; +Cc: Peter Zijlstra, rjw, linux-kernel On Thu, 2008-10-09 at 05:31 -0700, Paul E. McKenney wrote: > On Thu, Oct 09, 2008 at 07:06:38AM +0200, Peter Zijlstra wrote: > > On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote: > > > When I enable: > > > > > > CONFIG_GROUP_SCHED=y > > > CONFIG_FAIR_GROUP_SCHED=y > > > CONFIG_USER_SCHED=y > > > > > > and run a bash script onlining and offlining CPUs in an infinite loop > > > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following. Paul, Wuld you like to share your scipt? I tested cpu hotplug on my 8-core machine by unplug cpu 2~5 and plug them in a loop for one night and didn't trigger the issue. Did you set CONFIG_RT_GROUP_SCHED=y? > > > > > > On the off-chance that this is new news... > > > > Hmm, yes. I thought I had all those fixed :-( > > I know that feeling!!! ;-) > > > > [ 5538.091011] kernel BUG at kernel/sched_rt.c:322! > > > [ 5538.091011] invalid opcode: 0000 [#1] SMP > > > [ 5538.091011] Modules linked in: > > > [ 5538.091011] > > > [ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1) > > > [ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7 > > > [ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0 > > > [ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060 > > > [ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8 > > > [ 5538.091011] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > > [ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000) > > > [ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 > > > [ 5538.091011] c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 > > > [ 5538.091011] c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 > > > [ 5538.091011] Call Trace: > > > [ 5538.091011] [<c011d151>] ? rq_offline_rt+0x21/0x60 > > > [ 5538.091011] [<c011aedb>] ? set_rq_offline+0x2b/0x50 > > > [ 5538.091011] [<c011f967>] ? rq_attach_root+0xa7/0xb0 > > > [ 5538.091011] [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490 > > > > At the very least we're doing part of the offline process twice it > > seems, once through set_rq_offline()/set_rq_online() and once through > > disable_runtime()/enabled_runtime(). > > > > But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip > > cpus with RUNTIME_INF runtime that should be harmless. > > Would double-processing a non-offlined CPU cause trouble, perhaps > setting the runtime to a nonsensical value? > > > Modifications to rt_rq->rt_runtime are all done while holding > > rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime() > > and __disable_runtime() and __enable_runtime()). Which means its enough > > to hold either of those locks in order to get a stable reading of the > > value. These locks, especially rt_b->rt_runtime_lock, prevent the simultaneous changing of rt_runtime. It looks codes are ok. Anything related to RCU? > > > > Which leaves me puzzled for the moment... > > I know that feeling as well... > > > tip/master has the following commit to clarify the code somewhat: > > > > > > commit 78333cdd0e472180743d35988e576d6ecc6f6ddb > > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > > Date: Tue Sep 23 15:33:43 2008 +0200 > > > > sched: add some comments to the bandwidth code > > > > Hopefully clarify some of this code a little. > > > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > > > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c > > index 2e228bd..d570a8c 100644 > > --- a/kernel/sched_rt.c > > +++ b/kernel/sched_rt.c > > @@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) > > #endif /* CONFIG_RT_GROUP_SCHED */ > > > > #ifdef CONFIG_SMP > > +/* > > + * We ran out of runtime, see if we can borrow some from our neighbours. > > + */ > > Suppose that all CPUs nearby have run out of runtime. Or is that > possible? > > Thanx, Paul > > > static int do_balance_runtime(struct rt_rq *rt_rq) > > { > > struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); > > @@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq) > > continue; > > > > spin_lock(&iter->rt_runtime_lock); > > + /* > > + * Either all rqs have inf runtime and there's nothing to steal > > + * or __disable_runtime() below sets a specific rq to inf to > > + * indicate its been disabled and disalow stealing. > > + */ > > if (iter->rt_runtime == RUNTIME_INF) > > goto next; > > > > + /* > > + * From runqueues with spare time, take 1/n part of their > > + * spare time, but no more than our period. > > + */ > > diff = iter->rt_runtime - iter->rt_time; > > if (diff > 0) { > > diff = div_u64((u64)diff, weight); > > @@ -274,6 +286,9 @@ next: > > return more; > > } > > > > +/* > > + * Ensure this RQ takes back all the runtime it lend to its neighbours. > > + */ > > static void __disable_runtime(struct rq *rq) > > { > > struct root_domain *rd = rq->rd; > > @@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq) > > > > spin_lock(&rt_b->rt_runtime_lock); > > spin_lock(&rt_rq->rt_runtime_lock); > > + /* > > + * Either we're all inf and nobody needs to borrow, or we're > > + * already disabled and thus have nothing to do, or we have > > + * exactly the right amount of runtime to take out. > > + */ > > if (rt_rq->rt_runtime == RUNTIME_INF || > > rt_rq->rt_runtime == rt_b->rt_runtime) > > goto balanced; > > spin_unlock(&rt_rq->rt_runtime_lock); > > > > + /* > > + * Calculate the difference between what we started out with > > + * and what we current have, that's the amount of runtime > > + * we lend and now have to reclaim. > > + */ > > want = rt_b->rt_runtime - rt_rq->rt_runtime; > > > > + /* > > + * Greedy reclaim, take back as much as we can. > > + */ > > for_each_cpu_mask(i, rd->span) { > > struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i); > > s64 diff; > > > > + /* > > + * Can't reclaim from ourselves or disabled runqueues. > > + */ > > if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF) > > continue; > > > > @@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq) > > } > > > > spin_lock(&rt_rq->rt_runtime_lock); > > + /* > > + * We cannot be left wanting - that would mean some runtime > > + * leaked out of the system. > > + */ > > BUG_ON(want); > > balanced: > > + /* > > + * Disable all the borrow logic by pretending we have inf > > + * runtime - in which case borrowing doesn't make sense. > > + */ > > rt_rq->rt_runtime = RUNTIME_INF; > > spin_unlock(&rt_rq->rt_runtime_lock); > > spin_unlock(&rt_b->rt_runtime_lock); > > @@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq) > > if (unlikely(!scheduler_running)) > > return; > > > > + /* > > + * Reset each runqueue's bandwidth settings > > + */ > > for_each_leaf_rt_rq(rt_rq, rq) { > > struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel BUG at kernel/sched_rt.c:322! 2008-10-10 1:54 ` Zhang, Yanmin @ 2008-10-10 2:13 ` Paul E. McKenney 0 siblings, 0 replies; 7+ messages in thread From: Paul E. McKenney @ 2008-10-10 2:13 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: Peter Zijlstra, rjw, linux-kernel [-- Attachment #1: Type: text/plain, Size: 7865 bytes --] On Fri, Oct 10, 2008 at 09:54:11AM +0800, Zhang, Yanmin wrote: > > On Thu, 2008-10-09 at 05:31 -0700, Paul E. McKenney wrote: > > On Thu, Oct 09, 2008 at 07:06:38AM +0200, Peter Zijlstra wrote: > > > On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote: > > > > When I enable: > > > > > > > > CONFIG_GROUP_SCHED=y > > > > CONFIG_FAIR_GROUP_SCHED=y > > > > CONFIG_USER_SCHED=y > > > > > > > > and run a bash script onlining and offlining CPUs in an infinite loop > > > > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following. > Paul, > > Wuld you like to share your scipt? I tested cpu hotplug on my 8-core machine by > unplug cpu 2~5 and plug them in a loop for one night and didn't trigger the issue. See attached! I hand-edit the loop for the machine at hand, so on an 8-CPU x86 machine I would use "for ((i = 1; i < 8; i++))", given that x86 machines tend not to allow you to offline CPU 0. > Did you set CONFIG_RT_GROUP_SCHED=y? No, I did not. Thanx, Paul > > > > On the off-chance that this is new news... > > > > > > Hmm, yes. I thought I had all those fixed :-( > > > > I know that feeling!!! ;-) > > > > > > [ 5538.091011] kernel BUG at kernel/sched_rt.c:322! > > > > [ 5538.091011] invalid opcode: 0000 [#1] SMP > > > > [ 5538.091011] Modules linked in: > > > > [ 5538.091011] > > > > [ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1) > > > > [ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7 > > > > [ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0 > > > > [ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060 > > > > [ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8 > > > > [ 5538.091011] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > > > [ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000) > > > > [ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 > > > > [ 5538.091011] c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 > > > > [ 5538.091011] c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 > > > > [ 5538.091011] Call Trace: > > > > [ 5538.091011] [<c011d151>] ? rq_offline_rt+0x21/0x60 > > > > [ 5538.091011] [<c011aedb>] ? set_rq_offline+0x2b/0x50 > > > > [ 5538.091011] [<c011f967>] ? rq_attach_root+0xa7/0xb0 > > > > [ 5538.091011] [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490 > > > > > > At the very least we're doing part of the offline process twice it > > > seems, once through set_rq_offline()/set_rq_online() and once through > > > disable_runtime()/enabled_runtime(). > > > > > > But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip > > > cpus with RUNTIME_INF runtime that should be harmless. > > > > Would double-processing a non-offlined CPU cause trouble, perhaps > > setting the runtime to a nonsensical value? > > > > > Modifications to rt_rq->rt_runtime are all done while holding > > > rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime() > > > and __disable_runtime() and __enable_runtime()). Which means its enough > > > to hold either of those locks in order to get a stable reading of the > > > value. > These locks, especially rt_b->rt_runtime_lock, prevent the simultaneous > changing of rt_runtime. It looks codes are ok. > > Anything related to RCU? > > > > > > > Which leaves me puzzled for the moment... > > > > I know that feeling as well... > > > > > tip/master has the following commit to clarify the code somewhat: > > > > > > > > > commit 78333cdd0e472180743d35988e576d6ecc6f6ddb > > > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > > > Date: Tue Sep 23 15:33:43 2008 +0200 > > > > > > sched: add some comments to the bandwidth code > > > > > > Hopefully clarify some of this code a little. > > > > > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > > > > > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c > > > index 2e228bd..d570a8c 100644 > > > --- a/kernel/sched_rt.c > > > +++ b/kernel/sched_rt.c > > > @@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) > > > #endif /* CONFIG_RT_GROUP_SCHED */ > > > > > > #ifdef CONFIG_SMP > > > +/* > > > + * We ran out of runtime, see if we can borrow some from our neighbours. > > > + */ > > > > Suppose that all CPUs nearby have run out of runtime. Or is that > > possible? > > > > Thanx, Paul > > > > > static int do_balance_runtime(struct rt_rq *rt_rq) > > > { > > > struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); > > > @@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq) > > > continue; > > > > > > spin_lock(&iter->rt_runtime_lock); > > > + /* > > > + * Either all rqs have inf runtime and there's nothing to steal > > > + * or __disable_runtime() below sets a specific rq to inf to > > > + * indicate its been disabled and disalow stealing. > > > + */ > > > if (iter->rt_runtime == RUNTIME_INF) > > > goto next; > > > > > > + /* > > > + * From runqueues with spare time, take 1/n part of their > > > + * spare time, but no more than our period. > > > + */ > > > diff = iter->rt_runtime - iter->rt_time; > > > if (diff > 0) { > > > diff = div_u64((u64)diff, weight); > > > @@ -274,6 +286,9 @@ next: > > > return more; > > > } > > > > > > +/* > > > + * Ensure this RQ takes back all the runtime it lend to its neighbours. > > > + */ > > > static void __disable_runtime(struct rq *rq) > > > { > > > struct root_domain *rd = rq->rd; > > > @@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq) > > > > > > spin_lock(&rt_b->rt_runtime_lock); > > > spin_lock(&rt_rq->rt_runtime_lock); > > > + /* > > > + * Either we're all inf and nobody needs to borrow, or we're > > > + * already disabled and thus have nothing to do, or we have > > > + * exactly the right amount of runtime to take out. > > > + */ > > > if (rt_rq->rt_runtime == RUNTIME_INF || > > > rt_rq->rt_runtime == rt_b->rt_runtime) > > > goto balanced; > > > spin_unlock(&rt_rq->rt_runtime_lock); > > > > > > + /* > > > + * Calculate the difference between what we started out with > > > + * and what we current have, that's the amount of runtime > > > + * we lend and now have to reclaim. > > > + */ > > > want = rt_b->rt_runtime - rt_rq->rt_runtime; > > > > > > + /* > > > + * Greedy reclaim, take back as much as we can. > > > + */ > > > for_each_cpu_mask(i, rd->span) { > > > struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i); > > > s64 diff; > > > > > > + /* > > > + * Can't reclaim from ourselves or disabled runqueues. > > > + */ > > > if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF) > > > continue; > > > > > > @@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq) > > > } > > > > > > spin_lock(&rt_rq->rt_runtime_lock); > > > + /* > > > + * We cannot be left wanting - that would mean some runtime > > > + * leaked out of the system. > > > + */ > > > BUG_ON(want); > > > balanced: > > > + /* > > > + * Disable all the borrow logic by pretending we have inf > > > + * runtime - in which case borrowing doesn't make sense. > > > + */ > > > rt_rq->rt_runtime = RUNTIME_INF; > > > spin_unlock(&rt_rq->rt_runtime_lock); > > > spin_unlock(&rt_b->rt_runtime_lock); > > > @@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq) > > > if (unlikely(!scheduler_running)) > > > return; > > > > > > + /* > > > + * Reset each runqueue's bandwidth settings > > > + */ > > > for_each_leaf_rt_rq(rt_rq, rq) { > > > struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); > > [-- Attachment #2: onofftorture128.sh --] [-- Type: application/x-sh, Size: 669 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-10-10 2:13 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-09 1:14 kernel BUG at kernel/sched_rt.c:322! Paul E. McKenney 2008-10-09 2:45 ` Steven Noonan 2008-10-09 2:57 ` Paul E. McKenney 2008-10-09 5:06 ` Peter Zijlstra 2008-10-09 12:31 ` Paul E. McKenney 2008-10-10 1:54 ` Zhang, Yanmin 2008-10-10 2:13 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox