public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Dinakar Guniguntala <dino@in.ibm.com>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	suresh.b.siddha@intel.com, pwil3058@bigpond.net.au,
	clameter@sgi.com, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org
Subject: Re: v2.6.21.4-rt11
Date: Mon, 18 Jun 2007 20:42:15 +0530	[thread overview]
Message-ID: <20070618151215.GA9750@linux.vnet.ibm.com> (raw)
In-Reply-To: <20070616161213.GA2994@linux.vnet.ibm.com>

On Sat, Jun 16, 2007 at 09:12:13AM -0700, Paul E. McKenney wrote:
> On Sat, Jun 16, 2007 at 02:14:34PM +0530, Srivatsa Vaddagiri wrote:
> > On Fri, Jun 15, 2007 at 06:16:05PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 15, 2007 at 09:55:45PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > > to make sure it's not some effect in -rt causing this. v17 has an 
> > > > > > updated load balancing code. (which might or might not affect the 
> > > > > > rcutorture problem.)
> > > > > 
> > > > > Good point!  I will try the following:
> > > > > 
> > > > > 1.	Stock 2.6.21.5.
> > > > > 
> > > > > 2.	2.6.21-rt14.
> > > > > 
> > > > > 3.	2.6.21.5 + sched-cfs-v2.6.21.5-v17.patch
> > > > > 
> > > > > And quickly, before everyone else jumps on the machines that show the 
> > > > > problem.  ;-)
> > > > 
> > > > thanks! It's enough to check whether modprobe rcutorture still produces 
> > > > that weird balancing problem. That clearly has to be fixed ...
> > > > 
> > > > And i've Cc:-ed Dmitry and Srivatsa, who are busy hacking this area of 
> > > > the CFS code as we speak :-)
> > > 
> > > Well, I am not sure that the info I was able to collect will be all
> > > that helpful, but it most certainly does confirm that the balancing
> > > problem that rcutorture produces is indeed weird...
> > 
> > Hi Paul, 
> > 	I tried on two machines in our lab and could not recreate your
> > problem.
> > 
> > On a 2way x86_64 AMD box and 2.6.21.5+cfsv17:
> > 
> > 
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 12395 root      39  19     0    0    0 R 50.3  0.0   0:57.62 rcu_torture_rea
> > 12394 root      39  19     0    0    0 R 49.9  0.0   0:57.29 rcu_torture_rea
> > 12396 root      39  19     0    0    0 R 49.9  0.0   0:56.96 rcu_torture_rea
> > 12397 root      39  19     0    0    0 R 49.9  0.0   0:56.90 rcu_torture_rea
> > 
> > On a 4way x86_64 Intel Xeon box and 2.6.21.5+cfsv17:
> > 
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
> >  6258 root      39  19     0    0    0 R   53  0.0  17:29.72 0 rcu_torture_rea
> >  6252 root      39  19     0    0    0 R   49  0.0  17:49.40 3 rcu_torture_rea
> >  6257 root      39  19     0    0    0 R   49  0.0  17:22.49 2 rcu_torture_rea
> >  6256 root      39  19     0    0    0 R   48  0.0  17:50.12 1 rcu_torture_rea
> >  6254 root      39  19     0    0    0 R   48  0.0  17:26.98 0 rcu_torture_rea
> >  6255 root      39  19     0    0    0 R   48  0.0  17:25.74 2 rcu_torture_rea
> >  6251 root      39  19     0    0    0 R   45  0.0  17:47.45 3 rcu_torture_rea
> >  6253 root      39  19     0    0    0 R   45  0.0  17:48.48 1 rcu_torture_rea
> > 
> > 
> > I will try this on few more boxes we have on Monday. If I can't recreate, then 
> > I may request you to provide me machine details (or even access to the problem 
> > box if it is in IBM labs and if I am allowed to login!)
> 
> elm3b6, ABAT job 95107.  There are others, this particular job uses
> 2.6.21.5-cfsv17.

Paul,
	I logged into elm3b6 and did some investigation. I think I have
a tentative patch to fix your load-balance problem.

First, an explanation of the problem:

This particular machine, elm3b6, is a 4-cpu, (gasp, yes!) 4-node box i.e 
each CPU is a node by itself. If you don't have CONFIG_NUMA enabled,
then we won't have cross-node (i.e cross-cpu) load balancing.
Fortunately in your case you had CONFIG_NUMA enabled, but still were
hitting the (gross) load imbalance.

The problem seems to be with idle_balance(). This particular routine,
invoked by schedule() on a idle cpu, walks up sched-domain hierarchy and
tries to balance in each domain that has SD_BALANCE_NEWIDLE flag set.
The nodes-level domain (SD_NODE_INIT) however doesn't set this flag,
which means idle cpu looks for (im)balance within its own node at most and
not beyond. Now, here's the problem, if the idle cpu doesn't find
imbalance within its node (pulled_tasks = 0), it resets this_rq->next_balance
so that next balancing activity is deferred for upto a minute
(next_balance = jiffies + 60 *  HZ). If a idle cpu calls idle_balance
again in the next minute and finds no imbalance within its node, it
-again- resets next_balance. In your case, I think this was happening
repetetively, which made other CPUs never look for cross-node
(im)balance.

I believe the patch below is correct. With the patch applied, I could
not recreate the imbalance with rcutorture. Let me know whether you
still see the problem with this patch applied on any other machine.

I have CCed others who have worked in this area and request them to review 
this patch.

Andrew,
	If there is no objection from anyone, request you to pick this
up for next -mm release. It has been tested against 2.6.22-rc4-mm2.


idle_balance() can erroneously cause system-wide imbalance to be overlooked
by reseting rq->next_balance. When called sufficient number of times, it
can forever defer system-wide load balance. Patch below modifies
idle_balance() not to mess with ->next_balance. If indeed it turns out
that there is no imbalance even system-wide, rebalance_domains() will
anyway set ->next_balance to happen after a minute.


Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>


Index: linux-2.6.22-rc4/kernel/sched.c
===================================================================
--- linux-2.6.22-rc4.orig/kernel/sched.c	2007-06-18 07:16:49.000000000 -0700
+++ linux-2.6.22-rc4/kernel/sched.c	2007-06-18 07:18:41.000000000 -0700
@@ -2490,27 +2490,16 @@
 {
 	struct sched_domain *sd;
 	int pulled_task = 0;
-	unsigned long next_balance = jiffies + 60 *  HZ;
 
 	for_each_domain(this_cpu, sd) {
 		if (sd->flags & SD_BALANCE_NEWIDLE) {
 			/* If we've pulled tasks over stop searching: */
 			pulled_task = load_balance_newidle(this_cpu,
 							this_rq, sd);
-			if (time_after(next_balance,
-				  sd->last_balance + sd->balance_interval))
-				next_balance = sd->last_balance
-						+ sd->balance_interval;
 			if (pulled_task)
 				break;
 		}
 	}
-	if (!pulled_task)
-		/*
-		 * We are going idle. next_balance may be set based on
-		 * a busy processor. So reset next_balance.
-		 */
-		this_rq->next_balance = next_balance;
 }
 
 /*


-- 
Regards,
vatsa

       reply	other threads:[~2007-06-18 15:04 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070613180451.GA16628@elte.hu>
     [not found] ` <20070613184741.GC8125@linux.vnet.ibm.com>
     [not found]   ` <20070613185522.GA27335@elte.hu>
     [not found]     ` <20070613233910.GJ8125@linux.vnet.ibm.com>
     [not found]       ` <20070615144535.GA12078@elte.hu>
     [not found]         ` <20070615151452.GC9301@linux.vnet.ibm.com>
     [not found]           ` <20070615195545.GA28872@elte.hu>
     [not found]             ` <20070616011605.GH9301@linux.vnet.ibm.com>
     [not found]               ` <20070616084434.GG2559@linux.vnet.ibm.com>
     [not found]                 ` <20070616161213.GA2994@linux.vnet.ibm.com>
2007-06-18 15:12                   ` Srivatsa Vaddagiri [this message]
2007-06-18 16:54                     ` v2.6.21.4-rt11 Christoph Lameter
2007-06-18 17:35                       ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-18 17:59                         ` v2.6.21.4-rt11 Christoph Lameter
2007-06-19  1:52                           ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19  2:13                             ` v2.6.21.4-rt11 Siddha, Suresh B
2007-06-19  2:15                           ` v2.6.21.4-rt11 Siddha, Suresh B
2007-06-19  3:46                             ` v2.6.21.4-rt11 Christoph Lameter
2007-06-19  5:49                               ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19  8:07                                 ` v2.6.21.4-rt11 Ingo Molnar
2007-06-18 18:06                     ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19  9:04                     ` v2.6.21.4-rt11 Ingo Molnar
2007-06-19 10:43                       ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19 14:33                       ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19 19:15                         ` v2.6.21.4-rt11 Christoph Lameter
2007-06-19 15:08                       ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-19 19:14                       ` v2.6.21.4-rt11 Christoph Lameter
2007-06-10 17:50 v2.6.21.4-rt11 Miguel Botón
  -- strict thread matches above, loose matches on Subject: below --
2007-06-09 21:05 v2.6.21.4-rt11 Ingo Molnar
2007-06-11  1:19 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11  7:36   ` v2.6.21.4-rt11 Ingo Molnar
2007-06-11 14:44     ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 15:38       ` v2.6.21.4-rt11 Ingo Molnar
2007-06-11 15:55         ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 17:18           ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 20:44             ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 22:18               ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-12 21:37                 ` v2.6.21.4-rt11 Ingo Molnar
2007-06-13  1:27                   ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-12  6:03 ` v2.6.21.4-rt11 Eric St-Laurent
2007-06-12  7:32   ` v2.6.21.4-rt11 Ingo Molnar
2007-06-12 13:00     ` v2.6.21.4-rt11 Pallipadi, Venkatesh
2007-06-13  1:37       ` v2.6.21.4-rt11 Eric St-Laurent
2007-06-17 16:15 ` v2.6.21.4-rt11 Nelson Castillo
2007-06-17 16:43   ` v2.6.21.4-rt11 Thomas Gleixner
2007-06-17 16:49     ` v2.6.21.4-rt11 Nelson Castillo
2007-06-17 16:59       ` v2.6.21.4-rt11 Thomas Gleixner
2007-06-18 16:14         ` v2.6.21.4-rt11 Katsuya MATSUBARA
2007-06-19  4:04           ` v2.6.21.4-rt11 Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070618151215.GA9750@linux.vnet.ibm.com \
    --to=vatsa@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=dino@in.ibm.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pwil3058@bigpond.net.au \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox