public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at kernel/sched_rt.c:322!
@ 2008-10-09  1:14 Paul E. McKenney
  2008-10-09  2:45 ` Steven Noonan
  2008-10-09  5:06 ` Peter Zijlstra
  0 siblings, 2 replies; 7+ messages in thread
From: Paul E. McKenney @ 2008-10-09  1:14 UTC (permalink / raw)
  To: rjw; +Cc: linux-kernel

When I enable:

	CONFIG_GROUP_SCHED=y
	CONFIG_FAIR_GROUP_SCHED=y
	CONFIG_USER_SCHED=y

and run a bash script onlining and offlining CPUs in an infinite loop
on x86 using 2.6.27-rc9, after about 1.5 hours I get the following.

On the off-chance that this is new news...

							Thanx, Paul

	[ 5538.091011] kernel BUG at kernel/sched_rt.c:322!
	[ 5538.091011] invalid opcode: 0000 [#1] SMP 
	[ 5538.091011] Modules linked in:
	[ 5538.091011] 
	[ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1)
	[ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7
	[ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0
	[ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060
	[ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8
	[ 5538.091011]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
	[ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000)
	[ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 
	[ 5538.091011]        c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 
	[ 5538.091011]        c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 
	[ 5538.091011] Call Trace:
	[ 5538.091011]  [<c011d151>] ? rq_offline_rt+0x21/0x60
	[ 5538.091011]  [<c011aedb>] ? set_rq_offline+0x2b/0x50
	[ 5538.091011]  [<c011f967>] ? rq_attach_root+0xa7/0xb0
	[ 5538.091011]  [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490
	[ 5538.091011]  [<c013dfc1>] ? sched_clock_cpu+0x121/0x170
	[ 5538.091011]  [<c011b56e>] ? update_curr+0x4e/0x80
	[ 5538.091011]  [<c013ca8f>] ? hrtimer_run_pending+0x1f/0x90
	[ 5538.091011]  [<c013c350>] ? enqueue_hrtimer+0x60/0x80
	[ 5538.091011]  [<c011bc47>] ? __enqueue_entity+0xc7/0x100
	[ 5538.091011]  [<c01223de>] ? partition_sched_domains+0x1ae/0x220
	[ 5538.091011]  [<c012008f>] ? wake_up_process+0xf/0x20
	[ 5538.091011]  [<c0122476>] ? update_sched_domains+0x26/0x40
	[ 5538.091011]  [<c0374907>] ? notifier_call_chain+0x37/0x80
	[ 5538.091011]  [<c013d379>] ? __raw_notifier_call_chain+0x19/0x20
	[ 5538.091011]  [<c013d39a>] ? raw_notifier_call_chain+0x1a/0x20
	[ 5538.091011]  [<c036e9cf>] ? _cpu_up+0xaf/0x100
	[ 5538.091011]  [<c037125e>] ? mutex_lock+0xe/0x20
	[ 5538.091011]  [<c036ea69>] ? cpu_up+0x49/0x70
	[ 5538.091011]  [<c03619d8>] ? store_online+0x58/0x80
	[ 5538.091011]  [<c0361980>] ? store_online+0x0/0x80
	[ 5538.091011]  [<c0266f1c>] ? sysdev_store+0x2c/0x40
	[ 5538.091011]  [<c01b728d>] ? sysfs_write_file+0x9d/0x100
	[ 5538.091011]  [<c0175129>] ? vfs_write+0x99/0x130
	[ 5538.091011]  [<c01b71f0>] ? sysfs_write_file+0x0/0x100
	[ 5538.091011]  [<c017566d>] ? sys_write+0x3d/0x70
	[ 5538.091011]  [<c010318e>] ? syscall_call+0x7/0xb
	[ 5538.091011]  [<c0370000>] ? acpi_processor_start+0x630/0x63f
	[ 5538.091011]  =======================
	[ 5538.091011] Code: 87 72 ff ff ff 29 b3 4c 03 00 00 19 bb 50 03 00 00 31 f6 31 ff eb 9a 09 fe 0f 95 c0 0f b6 d8 8b 45 dc e8 6d 5f 25 00 85 db 74 a4 <0f> 0b eb fe 90 8d 74 26 00 55 89 d0 89 e5 83 ec 08 83 fa 16 89 
	[ 5538.091011] EIP: [<c011c287>] __disable_runtime+0x1c7/0x1d0 SS:ESP 0068:f6df7ca8
	[ 5538.091011] ---[ end trace 5b3bf11f31634d39 ]---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at kernel/sched_rt.c:322!
  2008-10-09  1:14 kernel BUG at kernel/sched_rt.c:322! Paul E. McKenney
@ 2008-10-09  2:45 ` Steven Noonan
  2008-10-09  2:57   ` Paul E. McKenney
  2008-10-09  5:06 ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Steven Noonan @ 2008-10-09  2:45 UTC (permalink / raw)
  To: paulmck; +Cc: rjw, linux-kernel

On Wed, Oct 8, 2008 at 6:14 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> When I enable:
>
>        CONFIG_GROUP_SCHED=y
>        CONFIG_FAIR_GROUP_SCHED=y
>        CONFIG_USER_SCHED=y
>
> and run a bash script onlining and offlining CPUs in an infinite loop
> on x86 using 2.6.27-rc9, after about 1.5 hours I get the following.

Is this a regression between 2.6.27-rc8 and -rc9, or did it crop up earlier?

- Steven

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at kernel/sched_rt.c:322!
  2008-10-09  2:45 ` Steven Noonan
@ 2008-10-09  2:57   ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2008-10-09  2:57 UTC (permalink / raw)
  To: Steven Noonan; +Cc: rjw, linux-kernel

On Wed, Oct 08, 2008 at 07:45:38PM -0700, Steven Noonan wrote:
> On Wed, Oct 8, 2008 at 6:14 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > When I enable:
> >
> >        CONFIG_GROUP_SCHED=y
> >        CONFIG_FAIR_GROUP_SCHED=y
> >        CONFIG_USER_SCHED=y
> >
> > and run a bash script onlining and offlining CPUs in an infinite loop
> > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following.
> 
> Is this a regression between 2.6.27-rc8 and -rc9, or did it crop up earlier?

Hello, Steve!

Good question.  I will run on older kernels.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at kernel/sched_rt.c:322!
  2008-10-09  1:14 kernel BUG at kernel/sched_rt.c:322! Paul E. McKenney
  2008-10-09  2:45 ` Steven Noonan
@ 2008-10-09  5:06 ` Peter Zijlstra
  2008-10-09 12:31   ` Paul E. McKenney
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2008-10-09  5:06 UTC (permalink / raw)
  To: paulmck; +Cc: rjw, linux-kernel

On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote:
> When I enable:
> 
> 	CONFIG_GROUP_SCHED=y
> 	CONFIG_FAIR_GROUP_SCHED=y
> 	CONFIG_USER_SCHED=y
> 
> and run a bash script onlining and offlining CPUs in an infinite loop
> on x86 using 2.6.27-rc9, after about 1.5 hours I get the following.
> 
> On the off-chance that this is new news...

Hmm, yes. I thought I had all those fixed :-(

> 	[ 5538.091011] kernel BUG at kernel/sched_rt.c:322!
> 	[ 5538.091011] invalid opcode: 0000 [#1] SMP 
> 	[ 5538.091011] Modules linked in:
> 	[ 5538.091011] 
> 	[ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1)
> 	[ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7
> 	[ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0
> 	[ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060
> 	[ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8
> 	[ 5538.091011]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> 	[ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000)
> 	[ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 
> 	[ 5538.091011]        c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 
> 	[ 5538.091011]        c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 
> 	[ 5538.091011] Call Trace:
> 	[ 5538.091011]  [<c011d151>] ? rq_offline_rt+0x21/0x60
> 	[ 5538.091011]  [<c011aedb>] ? set_rq_offline+0x2b/0x50
> 	[ 5538.091011]  [<c011f967>] ? rq_attach_root+0xa7/0xb0
> 	[ 5538.091011]  [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490

At the very least we're doing part of the offline process twice it
seems, once through set_rq_offline()/set_rq_online() and once through
disable_runtime()/enabled_runtime().

But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip
cpus with RUNTIME_INF runtime that should be harmless.

Modifications to rt_rq->rt_runtime are all done while holding
rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime()
and __disable_runtime() and __enable_runtime()). Which means its enough
to hold either of those locks in order to get a stable reading of the
value.

Which leaves me puzzled for the moment...

tip/master has the following commit to clarify the code somewhat:


commit 78333cdd0e472180743d35988e576d6ecc6f6ddb
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Tue Sep 23 15:33:43 2008 +0200

    sched: add some comments to the bandwidth code
    
    Hopefully clarify some of this code a little.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 2e228bd..d570a8c 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq)
 #endif /* CONFIG_RT_GROUP_SCHED */
 
 #ifdef CONFIG_SMP
+/*
+ * We ran out of runtime, see if we can borrow some from our neighbours.
+ */
 static int do_balance_runtime(struct rt_rq *rt_rq)
 {
 	struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
@@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
 			continue;
 
 		spin_lock(&iter->rt_runtime_lock);
+		/*
+		 * Either all rqs have inf runtime and there's nothing to steal
+		 * or __disable_runtime() below sets a specific rq to inf to
+		 * indicate its been disabled and disalow stealing.
+		 */
 		if (iter->rt_runtime == RUNTIME_INF)
 			goto next;
 
+		/*
+		 * From runqueues with spare time, take 1/n part of their
+		 * spare time, but no more than our period.
+		 */
 		diff = iter->rt_runtime - iter->rt_time;
 		if (diff > 0) {
 			diff = div_u64((u64)diff, weight);
@@ -274,6 +286,9 @@ next:
 	return more;
 }
 
+/*
+ * Ensure this RQ takes back all the runtime it lend to its neighbours.
+ */
 static void __disable_runtime(struct rq *rq)
 {
 	struct root_domain *rd = rq->rd;
@@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq)
 
 		spin_lock(&rt_b->rt_runtime_lock);
 		spin_lock(&rt_rq->rt_runtime_lock);
+		/*
+		 * Either we're all inf and nobody needs to borrow, or we're
+		 * already disabled and thus have nothing to do, or we have
+		 * exactly the right amount of runtime to take out.
+		 */
 		if (rt_rq->rt_runtime == RUNTIME_INF ||
 				rt_rq->rt_runtime == rt_b->rt_runtime)
 			goto balanced;
 		spin_unlock(&rt_rq->rt_runtime_lock);
 
+		/*
+		 * Calculate the difference between what we started out with
+		 * and what we current have, that's the amount of runtime
+		 * we lend and now have to reclaim.
+		 */
 		want = rt_b->rt_runtime - rt_rq->rt_runtime;
 
+		/*
+		 * Greedy reclaim, take back as much as we can.
+		 */
 		for_each_cpu_mask(i, rd->span) {
 			struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
 			s64 diff;
 
+			/*
+			 * Can't reclaim from ourselves or disabled runqueues.
+			 */
 			if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF)
 				continue;
 
@@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq)
 		}
 
 		spin_lock(&rt_rq->rt_runtime_lock);
+		/*
+		 * We cannot be left wanting - that would mean some runtime
+		 * leaked out of the system.
+		 */
 		BUG_ON(want);
 balanced:
+		/*
+		 * Disable all the borrow logic by pretending we have inf
+		 * runtime - in which case borrowing doesn't make sense.
+		 */
 		rt_rq->rt_runtime = RUNTIME_INF;
 		spin_unlock(&rt_rq->rt_runtime_lock);
 		spin_unlock(&rt_b->rt_runtime_lock);
@@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq)
 	if (unlikely(!scheduler_running))
 		return;
 
+	/*
+	 * Reset each runqueue's bandwidth settings
+	 */
 	for_each_leaf_rt_rq(rt_rq, rq) {
 		struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
 




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: kernel BUG at kernel/sched_rt.c:322!
  2008-10-09  5:06 ` Peter Zijlstra
@ 2008-10-09 12:31   ` Paul E. McKenney
  2008-10-10  1:54     ` Zhang, Yanmin
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2008-10-09 12:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: rjw, linux-kernel

On Thu, Oct 09, 2008 at 07:06:38AM +0200, Peter Zijlstra wrote:
> On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote:
> > When I enable:
> > 
> > 	CONFIG_GROUP_SCHED=y
> > 	CONFIG_FAIR_GROUP_SCHED=y
> > 	CONFIG_USER_SCHED=y
> > 
> > and run a bash script onlining and offlining CPUs in an infinite loop
> > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following.
> > 
> > On the off-chance that this is new news...
> 
> Hmm, yes. I thought I had all those fixed :-(

I know that feeling!!!  ;-)

> > 	[ 5538.091011] kernel BUG at kernel/sched_rt.c:322!
> > 	[ 5538.091011] invalid opcode: 0000 [#1] SMP 
> > 	[ 5538.091011] Modules linked in:
> > 	[ 5538.091011] 
> > 	[ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1)
> > 	[ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7
> > 	[ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0
> > 	[ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060
> > 	[ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8
> > 	[ 5538.091011]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > 	[ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000)
> > 	[ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 
> > 	[ 5538.091011]        c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 
> > 	[ 5538.091011]        c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 
> > 	[ 5538.091011] Call Trace:
> > 	[ 5538.091011]  [<c011d151>] ? rq_offline_rt+0x21/0x60
> > 	[ 5538.091011]  [<c011aedb>] ? set_rq_offline+0x2b/0x50
> > 	[ 5538.091011]  [<c011f967>] ? rq_attach_root+0xa7/0xb0
> > 	[ 5538.091011]  [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490
> 
> At the very least we're doing part of the offline process twice it
> seems, once through set_rq_offline()/set_rq_online() and once through
> disable_runtime()/enabled_runtime().
> 
> But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip
> cpus with RUNTIME_INF runtime that should be harmless.

Would double-processing a non-offlined CPU cause trouble, perhaps
setting the runtime to a nonsensical value?

> Modifications to rt_rq->rt_runtime are all done while holding
> rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime()
> and __disable_runtime() and __enable_runtime()). Which means its enough
> to hold either of those locks in order to get a stable reading of the
> value.
> 
> Which leaves me puzzled for the moment...

I know that feeling as well...

> tip/master has the following commit to clarify the code somewhat:
> 
> 
> commit 78333cdd0e472180743d35988e576d6ecc6f6ddb
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Tue Sep 23 15:33:43 2008 +0200
> 
>     sched: add some comments to the bandwidth code
>     
>     Hopefully clarify some of this code a little.
>     
>     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 2e228bd..d570a8c 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq)
>  #endif /* CONFIG_RT_GROUP_SCHED */
> 
>  #ifdef CONFIG_SMP
> +/*
> + * We ran out of runtime, see if we can borrow some from our neighbours.
> + */

Suppose that all CPUs nearby have run out of runtime.  Or is that
possible?

							Thanx, Paul

>  static int do_balance_runtime(struct rt_rq *rt_rq)
>  {
>  	struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
> @@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
>  			continue;
> 
>  		spin_lock(&iter->rt_runtime_lock);
> +		/*
> +		 * Either all rqs have inf runtime and there's nothing to steal
> +		 * or __disable_runtime() below sets a specific rq to inf to
> +		 * indicate its been disabled and disalow stealing.
> +		 */
>  		if (iter->rt_runtime == RUNTIME_INF)
>  			goto next;
> 
> +		/*
> +		 * From runqueues with spare time, take 1/n part of their
> +		 * spare time, but no more than our period.
> +		 */
>  		diff = iter->rt_runtime - iter->rt_time;
>  		if (diff > 0) {
>  			diff = div_u64((u64)diff, weight);
> @@ -274,6 +286,9 @@ next:
>  	return more;
>  }
> 
> +/*
> + * Ensure this RQ takes back all the runtime it lend to its neighbours.
> + */
>  static void __disable_runtime(struct rq *rq)
>  {
>  	struct root_domain *rd = rq->rd;
> @@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq)
> 
>  		spin_lock(&rt_b->rt_runtime_lock);
>  		spin_lock(&rt_rq->rt_runtime_lock);
> +		/*
> +		 * Either we're all inf and nobody needs to borrow, or we're
> +		 * already disabled and thus have nothing to do, or we have
> +		 * exactly the right amount of runtime to take out.
> +		 */
>  		if (rt_rq->rt_runtime == RUNTIME_INF ||
>  				rt_rq->rt_runtime == rt_b->rt_runtime)
>  			goto balanced;
>  		spin_unlock(&rt_rq->rt_runtime_lock);
> 
> +		/*
> +		 * Calculate the difference between what we started out with
> +		 * and what we current have, that's the amount of runtime
> +		 * we lend and now have to reclaim.
> +		 */
>  		want = rt_b->rt_runtime - rt_rq->rt_runtime;
> 
> +		/*
> +		 * Greedy reclaim, take back as much as we can.
> +		 */
>  		for_each_cpu_mask(i, rd->span) {
>  			struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
>  			s64 diff;
> 
> +			/*
> +			 * Can't reclaim from ourselves or disabled runqueues.
> +			 */
>  			if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF)
>  				continue;
> 
> @@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq)
>  		}
> 
>  		spin_lock(&rt_rq->rt_runtime_lock);
> +		/*
> +		 * We cannot be left wanting - that would mean some runtime
> +		 * leaked out of the system.
> +		 */
>  		BUG_ON(want);
>  balanced:
> +		/*
> +		 * Disable all the borrow logic by pretending we have inf
> +		 * runtime - in which case borrowing doesn't make sense.
> +		 */
>  		rt_rq->rt_runtime = RUNTIME_INF;
>  		spin_unlock(&rt_rq->rt_runtime_lock);
>  		spin_unlock(&rt_b->rt_runtime_lock);
> @@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq)
>  	if (unlikely(!scheduler_running))
>  		return;
> 
> +	/*
> +	 * Reset each runqueue's bandwidth settings
> +	 */
>  	for_each_leaf_rt_rq(rt_rq, rq) {
>  		struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at kernel/sched_rt.c:322!
  2008-10-09 12:31   ` Paul E. McKenney
@ 2008-10-10  1:54     ` Zhang, Yanmin
  2008-10-10  2:13       ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Zhang, Yanmin @ 2008-10-10  1:54 UTC (permalink / raw)
  To: paulmck; +Cc: Peter Zijlstra, rjw, linux-kernel


On Thu, 2008-10-09 at 05:31 -0700, Paul E. McKenney wrote:
> On Thu, Oct 09, 2008 at 07:06:38AM +0200, Peter Zijlstra wrote:
> > On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote:
> > > When I enable:
> > > 
> > > 	CONFIG_GROUP_SCHED=y
> > > 	CONFIG_FAIR_GROUP_SCHED=y
> > > 	CONFIG_USER_SCHED=y
> > > 
> > > and run a bash script onlining and offlining CPUs in an infinite loop
> > > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following.
Paul,

Wuld you like to share your scipt? I tested cpu hotplug on my 8-core machine by
unplug cpu 2~5 and plug them in a loop for one night and didn't trigger the issue.

Did you set CONFIG_RT_GROUP_SCHED=y?

> > > 
> > > On the off-chance that this is new news...
> > 
> > Hmm, yes. I thought I had all those fixed :-(
> 
> I know that feeling!!!  ;-)
> 
> > > 	[ 5538.091011] kernel BUG at kernel/sched_rt.c:322!
> > > 	[ 5538.091011] invalid opcode: 0000 [#1] SMP 
> > > 	[ 5538.091011] Modules linked in:
> > > 	[ 5538.091011] 
> > > 	[ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1)
> > > 	[ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7
> > > 	[ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0
> > > 	[ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060
> > > 	[ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8
> > > 	[ 5538.091011]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > 	[ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000)
> > > 	[ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 
> > > 	[ 5538.091011]        c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 
> > > 	[ 5538.091011]        c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 
> > > 	[ 5538.091011] Call Trace:
> > > 	[ 5538.091011]  [<c011d151>] ? rq_offline_rt+0x21/0x60
> > > 	[ 5538.091011]  [<c011aedb>] ? set_rq_offline+0x2b/0x50
> > > 	[ 5538.091011]  [<c011f967>] ? rq_attach_root+0xa7/0xb0
> > > 	[ 5538.091011]  [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490
> > 
> > At the very least we're doing part of the offline process twice it
> > seems, once through set_rq_offline()/set_rq_online() and once through
> > disable_runtime()/enabled_runtime().
> > 
> > But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip
> > cpus with RUNTIME_INF runtime that should be harmless.
> 
> Would double-processing a non-offlined CPU cause trouble, perhaps
> setting the runtime to a nonsensical value?
> 
> > Modifications to rt_rq->rt_runtime are all done while holding
> > rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime()
> > and __disable_runtime() and __enable_runtime()). Which means its enough
> > to hold either of those locks in order to get a stable reading of the
> > value.
These locks, especially rt_b->rt_runtime_lock, prevent the simultaneous
changing of rt_runtime. It looks codes are ok.

Anything related to RCU?

> > 
> > Which leaves me puzzled for the moment...
> 
> I know that feeling as well...
> 
> > tip/master has the following commit to clarify the code somewhat:
> > 
> > 
> > commit 78333cdd0e472180743d35988e576d6ecc6f6ddb
> > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date:   Tue Sep 23 15:33:43 2008 +0200
> > 
> >     sched: add some comments to the bandwidth code
> >     
> >     Hopefully clarify some of this code a little.
> >     
> >     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > 
> > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> > index 2e228bd..d570a8c 100644
> > --- a/kernel/sched_rt.c
> > +++ b/kernel/sched_rt.c
> > @@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq)
> >  #endif /* CONFIG_RT_GROUP_SCHED */
> > 
> >  #ifdef CONFIG_SMP
> > +/*
> > + * We ran out of runtime, see if we can borrow some from our neighbours.
> > + */
> 
> Suppose that all CPUs nearby have run out of runtime.  Or is that
> possible?
> 
> 							Thanx, Paul
> 
> >  static int do_balance_runtime(struct rt_rq *rt_rq)
> >  {
> >  	struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
> > @@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
> >  			continue;
> > 
> >  		spin_lock(&iter->rt_runtime_lock);
> > +		/*
> > +		 * Either all rqs have inf runtime and there's nothing to steal
> > +		 * or __disable_runtime() below sets a specific rq to inf to
> > +		 * indicate its been disabled and disalow stealing.
> > +		 */
> >  		if (iter->rt_runtime == RUNTIME_INF)
> >  			goto next;
> > 
> > +		/*
> > +		 * From runqueues with spare time, take 1/n part of their
> > +		 * spare time, but no more than our period.
> > +		 */
> >  		diff = iter->rt_runtime - iter->rt_time;
> >  		if (diff > 0) {
> >  			diff = div_u64((u64)diff, weight);
> > @@ -274,6 +286,9 @@ next:
> >  	return more;
> >  }
> > 
> > +/*
> > + * Ensure this RQ takes back all the runtime it lend to its neighbours.
> > + */
> >  static void __disable_runtime(struct rq *rq)
> >  {
> >  	struct root_domain *rd = rq->rd;
> > @@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq)
> > 
> >  		spin_lock(&rt_b->rt_runtime_lock);
> >  		spin_lock(&rt_rq->rt_runtime_lock);
> > +		/*
> > +		 * Either we're all inf and nobody needs to borrow, or we're
> > +		 * already disabled and thus have nothing to do, or we have
> > +		 * exactly the right amount of runtime to take out.
> > +		 */
> >  		if (rt_rq->rt_runtime == RUNTIME_INF ||
> >  				rt_rq->rt_runtime == rt_b->rt_runtime)
> >  			goto balanced;
> >  		spin_unlock(&rt_rq->rt_runtime_lock);
> > 
> > +		/*
> > +		 * Calculate the difference between what we started out with
> > +		 * and what we current have, that's the amount of runtime
> > +		 * we lend and now have to reclaim.
> > +		 */
> >  		want = rt_b->rt_runtime - rt_rq->rt_runtime;
> > 
> > +		/*
> > +		 * Greedy reclaim, take back as much as we can.
> > +		 */
> >  		for_each_cpu_mask(i, rd->span) {
> >  			struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
> >  			s64 diff;
> > 
> > +			/*
> > +			 * Can't reclaim from ourselves or disabled runqueues.
> > +			 */
> >  			if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF)
> >  				continue;
> > 
> > @@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq)
> >  		}
> > 
> >  		spin_lock(&rt_rq->rt_runtime_lock);
> > +		/*
> > +		 * We cannot be left wanting - that would mean some runtime
> > +		 * leaked out of the system.
> > +		 */
> >  		BUG_ON(want);
> >  balanced:
> > +		/*
> > +		 * Disable all the borrow logic by pretending we have inf
> > +		 * runtime - in which case borrowing doesn't make sense.
> > +		 */
> >  		rt_rq->rt_runtime = RUNTIME_INF;
> >  		spin_unlock(&rt_rq->rt_runtime_lock);
> >  		spin_unlock(&rt_b->rt_runtime_lock);
> > @@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq)
> >  	if (unlikely(!scheduler_running))
> >  		return;
> > 
> > +	/*
> > +	 * Reset each runqueue's bandwidth settings
> > +	 */
> >  	for_each_leaf_rt_rq(rt_rq, rq) {
> >  		struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at kernel/sched_rt.c:322!
  2008-10-10  1:54     ` Zhang, Yanmin
@ 2008-10-10  2:13       ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2008-10-10  2:13 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: Peter Zijlstra, rjw, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7865 bytes --]

On Fri, Oct 10, 2008 at 09:54:11AM +0800, Zhang, Yanmin wrote:
> 
> On Thu, 2008-10-09 at 05:31 -0700, Paul E. McKenney wrote:
> > On Thu, Oct 09, 2008 at 07:06:38AM +0200, Peter Zijlstra wrote:
> > > On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote:
> > > > When I enable:
> > > > 
> > > > 	CONFIG_GROUP_SCHED=y
> > > > 	CONFIG_FAIR_GROUP_SCHED=y
> > > > 	CONFIG_USER_SCHED=y
> > > > 
> > > > and run a bash script onlining and offlining CPUs in an infinite loop
> > > > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following.
> Paul,
> 
> Wuld you like to share your scipt? I tested cpu hotplug on my 8-core machine by
> unplug cpu 2~5 and plug them in a loop for one night and didn't trigger the issue.

See attached!  I hand-edit the loop for the machine at hand, so on an
8-CPU x86 machine I would use "for ((i = 1; i < 8; i++))", given that
x86 machines tend not to allow you to offline CPU 0.

> Did you set CONFIG_RT_GROUP_SCHED=y?

No, I did not.

							Thanx, Paul

> > > > On the off-chance that this is new news...
> > > 
> > > Hmm, yes. I thought I had all those fixed :-(
> > 
> > I know that feeling!!!  ;-)
> > 
> > > > 	[ 5538.091011] kernel BUG at kernel/sched_rt.c:322!
> > > > 	[ 5538.091011] invalid opcode: 0000 [#1] SMP 
> > > > 	[ 5538.091011] Modules linked in:
> > > > 	[ 5538.091011] 
> > > > 	[ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1)
> > > > 	[ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7
> > > > 	[ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0
> > > > 	[ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060
> > > > 	[ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8
> > > > 	[ 5538.091011]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > > 	[ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000)
> > > > 	[ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 
> > > > 	[ 5538.091011]        c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 
> > > > 	[ 5538.091011]        c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 
> > > > 	[ 5538.091011] Call Trace:
> > > > 	[ 5538.091011]  [<c011d151>] ? rq_offline_rt+0x21/0x60
> > > > 	[ 5538.091011]  [<c011aedb>] ? set_rq_offline+0x2b/0x50
> > > > 	[ 5538.091011]  [<c011f967>] ? rq_attach_root+0xa7/0xb0
> > > > 	[ 5538.091011]  [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490
> > > 
> > > At the very least we're doing part of the offline process twice it
> > > seems, once through set_rq_offline()/set_rq_online() and once through
> > > disable_runtime()/enabled_runtime().
> > > 
> > > But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip
> > > cpus with RUNTIME_INF runtime that should be harmless.
> > 
> > Would double-processing a non-offlined CPU cause trouble, perhaps
> > setting the runtime to a nonsensical value?
> > 
> > > Modifications to rt_rq->rt_runtime are all done while holding
> > > rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime()
> > > and __disable_runtime() and __enable_runtime()). Which means its enough
> > > to hold either of those locks in order to get a stable reading of the
> > > value.
> These locks, especially rt_b->rt_runtime_lock, prevent the simultaneous
> changing of rt_runtime. It looks codes are ok.
> 
> Anything related to RCU?
> 
> > > 
> > > Which leaves me puzzled for the moment...
> > 
> > I know that feeling as well...
> > 
> > > tip/master has the following commit to clarify the code somewhat:
> > > 
> > > 
> > > commit 78333cdd0e472180743d35988e576d6ecc6f6ddb
> > > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > Date:   Tue Sep 23 15:33:43 2008 +0200
> > > 
> > >     sched: add some comments to the bandwidth code
> > >     
> > >     Hopefully clarify some of this code a little.
> > >     
> > >     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > 
> > > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> > > index 2e228bd..d570a8c 100644
> > > --- a/kernel/sched_rt.c
> > > +++ b/kernel/sched_rt.c
> > > @@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq)
> > >  #endif /* CONFIG_RT_GROUP_SCHED */
> > > 
> > >  #ifdef CONFIG_SMP
> > > +/*
> > > + * We ran out of runtime, see if we can borrow some from our neighbours.
> > > + */
> > 
> > Suppose that all CPUs nearby have run out of runtime.  Or is that
> > possible?
> > 
> > 							Thanx, Paul
> > 
> > >  static int do_balance_runtime(struct rt_rq *rt_rq)
> > >  {
> > >  	struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
> > > @@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
> > >  			continue;
> > > 
> > >  		spin_lock(&iter->rt_runtime_lock);
> > > +		/*
> > > +		 * Either all rqs have inf runtime and there's nothing to steal
> > > +		 * or __disable_runtime() below sets a specific rq to inf to
> > > +		 * indicate its been disabled and disalow stealing.
> > > +		 */
> > >  		if (iter->rt_runtime == RUNTIME_INF)
> > >  			goto next;
> > > 
> > > +		/*
> > > +		 * From runqueues with spare time, take 1/n part of their
> > > +		 * spare time, but no more than our period.
> > > +		 */
> > >  		diff = iter->rt_runtime - iter->rt_time;
> > >  		if (diff > 0) {
> > >  			diff = div_u64((u64)diff, weight);
> > > @@ -274,6 +286,9 @@ next:
> > >  	return more;
> > >  }
> > > 
> > > +/*
> > > + * Ensure this RQ takes back all the runtime it lend to its neighbours.
> > > + */
> > >  static void __disable_runtime(struct rq *rq)
> > >  {
> > >  	struct root_domain *rd = rq->rd;
> > > @@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq)
> > > 
> > >  		spin_lock(&rt_b->rt_runtime_lock);
> > >  		spin_lock(&rt_rq->rt_runtime_lock);
> > > +		/*
> > > +		 * Either we're all inf and nobody needs to borrow, or we're
> > > +		 * already disabled and thus have nothing to do, or we have
> > > +		 * exactly the right amount of runtime to take out.
> > > +		 */
> > >  		if (rt_rq->rt_runtime == RUNTIME_INF ||
> > >  				rt_rq->rt_runtime == rt_b->rt_runtime)
> > >  			goto balanced;
> > >  		spin_unlock(&rt_rq->rt_runtime_lock);
> > > 
> > > +		/*
> > > +		 * Calculate the difference between what we started out with
> > > +		 * and what we current have, that's the amount of runtime
> > > +		 * we lend and now have to reclaim.
> > > +		 */
> > >  		want = rt_b->rt_runtime - rt_rq->rt_runtime;
> > > 
> > > +		/*
> > > +		 * Greedy reclaim, take back as much as we can.
> > > +		 */
> > >  		for_each_cpu_mask(i, rd->span) {
> > >  			struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
> > >  			s64 diff;
> > > 
> > > +			/*
> > > +			 * Can't reclaim from ourselves or disabled runqueues.
> > > +			 */
> > >  			if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF)
> > >  				continue;
> > > 
> > > @@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq)
> > >  		}
> > > 
> > >  		spin_lock(&rt_rq->rt_runtime_lock);
> > > +		/*
> > > +		 * We cannot be left wanting - that would mean some runtime
> > > +		 * leaked out of the system.
> > > +		 */
> > >  		BUG_ON(want);
> > >  balanced:
> > > +		/*
> > > +		 * Disable all the borrow logic by pretending we have inf
> > > +		 * runtime - in which case borrowing doesn't make sense.
> > > +		 */
> > >  		rt_rq->rt_runtime = RUNTIME_INF;
> > >  		spin_unlock(&rt_rq->rt_runtime_lock);
> > >  		spin_unlock(&rt_b->rt_runtime_lock);
> > > @@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq)
> > >  	if (unlikely(!scheduler_running))
> > >  		return;
> > > 
> > > +	/*
> > > +	 * Reset each runqueue's bandwidth settings
> > > +	 */
> > >  	for_each_leaf_rt_rq(rt_rq, rq) {
> > >  		struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
> 
> 

[-- Attachment #2: onofftorture128.sh --]
[-- Type: application/x-sh, Size: 669 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-10-10  2:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-09  1:14 kernel BUG at kernel/sched_rt.c:322! Paul E. McKenney
2008-10-09  2:45 ` Steven Noonan
2008-10-09  2:57   ` Paul E. McKenney
2008-10-09  5:06 ` Peter Zijlstra
2008-10-09 12:31   ` Paul E. McKenney
2008-10-10  1:54     ` Zhang, Yanmin
2008-10-10  2:13       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox