All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: rjw@sisk.pl, linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/sched_rt.c:322!
Date: Thu, 9 Oct 2008 05:31:32 -0700	[thread overview]
Message-ID: <20081009123132.GF6628@linux.vnet.ibm.com> (raw)
In-Reply-To: <1223528798.7382.27.camel@lappy.programming.kicks-ass.net>

On Thu, Oct 09, 2008 at 07:06:38AM +0200, Peter Zijlstra wrote:
> On Wed, 2008-10-08 at 18:14 -0700, Paul E. McKenney wrote:
> > When I enable:
> > 
> > 	CONFIG_GROUP_SCHED=y
> > 	CONFIG_FAIR_GROUP_SCHED=y
> > 	CONFIG_USER_SCHED=y
> > 
> > and run a bash script onlining and offlining CPUs in an infinite loop
> > on x86 using 2.6.27-rc9, after about 1.5 hours I get the following.
> > 
> > On the off-chance that this is new news...
> 
> Hmm, yes. I thought I had all those fixed :-(

I know that feeling!!!  ;-)

> > 	[ 5538.091011] kernel BUG at kernel/sched_rt.c:322!
> > 	[ 5538.091011] invalid opcode: 0000 [#1] SMP 
> > 	[ 5538.091011] Modules linked in:
> > 	[ 5538.091011] 
> > 	[ 5538.091011] Pid: 2819, comm: sh Not tainted (2.6.27-rc9-autokern1 #1)
> > 	[ 5538.091011] EIP: 0060:[<c011c287>] EFLAGS: 00010002 CPU: 7
> > 	[ 5538.091011] EIP is at __disable_runtime+0x1c7/0x1d0
> > 	[ 5538.091011] EAX: c9056eec EBX: 00000001 ECX: 00000008 EDX: 00006060
> > 	[ 5538.091011] ESI: 02faf080 EDI: 00000000 EBP: f6df7cd0 ESP: f6df7ca8
> > 	[ 5538.091011]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > 	[ 5538.091011] Process sh (pid: 2819, ti=f6df6000 task=f6cbdc00 task.ti=f6df6000)
> > 	[ 5538.091011] Stack: f68c8004 c9056eec f68c8000 c9056b98 00000008 5d353631 c04d0020 c9056b00 
> > 	[ 5538.091011]        c9056b00 c9056b00 f6df7cdc c011d151 c037dfc0 f6df7cec c011aedb f68c8000 
> > 	[ 5538.091011]        c04d2200 f6df7d04 c011f967 00000282 00000000 00000000 00000000 f6df7e48 
> > 	[ 5538.091011] Call Trace:
> > 	[ 5538.091011]  [<c011d151>] ? rq_offline_rt+0x21/0x60
> > 	[ 5538.091011]  [<c011aedb>] ? set_rq_offline+0x2b/0x50
> > 	[ 5538.091011]  [<c011f967>] ? rq_attach_root+0xa7/0xb0
> > 	[ 5538.091011]  [<c0120bbf>] ? cpu_attach_domain+0x30f/0x490
> 
> At the very least we're doing part of the offline process twice it
> seems, once through set_rq_offline()/set_rq_online() and once through
> disable_runtime()/enabled_runtime().
> 
> But seeing as we set an offlined cpu's runtime to RUNTIME_INF and skip
> cpus with RUNTIME_INF runtime that should be harmless.

Would double-processing a non-offlined CPU cause trouble, perhaps
setting the runtime to a nonsensical value?

> Modifications to rt_rq->rt_runtime are all done while holding
> rt_b->rt_runtime_lock and rt_rq->rt_runtime_lock (do_balance_runtime()
> and __disable_runtime() and __enable_runtime()). Which means its enough
> to hold either of those locks in order to get a stable reading of the
> value.
> 
> Which leaves me puzzled for the moment...

I know that feeling as well...

> tip/master has the following commit to clarify the code somewhat:
> 
> 
> commit 78333cdd0e472180743d35988e576d6ecc6f6ddb
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Tue Sep 23 15:33:43 2008 +0200
> 
>     sched: add some comments to the bandwidth code
>     
>     Hopefully clarify some of this code a little.
>     
>     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 2e228bd..d570a8c 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq)
>  #endif /* CONFIG_RT_GROUP_SCHED */
> 
>  #ifdef CONFIG_SMP
> +/*
> + * We ran out of runtime, see if we can borrow some from our neighbours.
> + */

Suppose that all CPUs nearby have run out of runtime.  Or is that
possible?

							Thanx, Paul

>  static int do_balance_runtime(struct rt_rq *rt_rq)
>  {
>  	struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
> @@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
>  			continue;
> 
>  		spin_lock(&iter->rt_runtime_lock);
> +		/*
> +		 * Either all rqs have inf runtime and there's nothing to steal
> +		 * or __disable_runtime() below sets a specific rq to inf to
> +		 * indicate its been disabled and disalow stealing.
> +		 */
>  		if (iter->rt_runtime == RUNTIME_INF)
>  			goto next;
> 
> +		/*
> +		 * From runqueues with spare time, take 1/n part of their
> +		 * spare time, but no more than our period.
> +		 */
>  		diff = iter->rt_runtime - iter->rt_time;
>  		if (diff > 0) {
>  			diff = div_u64((u64)diff, weight);
> @@ -274,6 +286,9 @@ next:
>  	return more;
>  }
> 
> +/*
> + * Ensure this RQ takes back all the runtime it lend to its neighbours.
> + */
>  static void __disable_runtime(struct rq *rq)
>  {
>  	struct root_domain *rd = rq->rd;
> @@ -289,17 +304,33 @@ static void __disable_runtime(struct rq *rq)
> 
>  		spin_lock(&rt_b->rt_runtime_lock);
>  		spin_lock(&rt_rq->rt_runtime_lock);
> +		/*
> +		 * Either we're all inf and nobody needs to borrow, or we're
> +		 * already disabled and thus have nothing to do, or we have
> +		 * exactly the right amount of runtime to take out.
> +		 */
>  		if (rt_rq->rt_runtime == RUNTIME_INF ||
>  				rt_rq->rt_runtime == rt_b->rt_runtime)
>  			goto balanced;
>  		spin_unlock(&rt_rq->rt_runtime_lock);
> 
> +		/*
> +		 * Calculate the difference between what we started out with
> +		 * and what we current have, that's the amount of runtime
> +		 * we lend and now have to reclaim.
> +		 */
>  		want = rt_b->rt_runtime - rt_rq->rt_runtime;
> 
> +		/*
> +		 * Greedy reclaim, take back as much as we can.
> +		 */
>  		for_each_cpu_mask(i, rd->span) {
>  			struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
>  			s64 diff;
> 
> +			/*
> +			 * Can't reclaim from ourselves or disabled runqueues.
> +			 */
>  			if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF)
>  				continue;
> 
> @@ -319,8 +350,16 @@ static void __disable_runtime(struct rq *rq)
>  		}
> 
>  		spin_lock(&rt_rq->rt_runtime_lock);
> +		/*
> +		 * We cannot be left wanting - that would mean some runtime
> +		 * leaked out of the system.
> +		 */
>  		BUG_ON(want);
>  balanced:
> +		/*
> +		 * Disable all the borrow logic by pretending we have inf
> +		 * runtime - in which case borrowing doesn't make sense.
> +		 */
>  		rt_rq->rt_runtime = RUNTIME_INF;
>  		spin_unlock(&rt_rq->rt_runtime_lock);
>  		spin_unlock(&rt_b->rt_runtime_lock);
> @@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *rq)
>  	if (unlikely(!scheduler_running))
>  		return;
> 
> +	/*
> +	 * Reset each runqueue's bandwidth settings
> +	 */
>  	for_each_leaf_rt_rq(rt_rq, rq) {
>  		struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
> 
> 
> 
> 

  reply	other threads:[~2008-10-09 12:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-09  1:14 kernel BUG at kernel/sched_rt.c:322! Paul E. McKenney
2008-10-09  2:45 ` Steven Noonan
2008-10-09  2:57   ` Paul E. McKenney
2008-10-09  5:06 ` Peter Zijlstra
2008-10-09 12:31   ` Paul E. McKenney [this message]
2008-10-10  1:54     ` Zhang, Yanmin
2008-10-10  2:13       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081009123132.GF6628@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.