From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
Dinakar Guniguntala <dino@in.ibm.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
suresh.b.siddha@intel.com, pwil3058@bigpond.net.au,
clameter@sgi.com, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org
Subject: Re: v2.6.21.4-rt11
Date: Mon, 18 Jun 2007 20:42:15 +0530 [thread overview]
Message-ID: <20070618151215.GA9750@linux.vnet.ibm.com> (raw)
In-Reply-To: <20070616161213.GA2994@linux.vnet.ibm.com>
On Sat, Jun 16, 2007 at 09:12:13AM -0700, Paul E. McKenney wrote:
> On Sat, Jun 16, 2007 at 02:14:34PM +0530, Srivatsa Vaddagiri wrote:
> > On Fri, Jun 15, 2007 at 06:16:05PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 15, 2007 at 09:55:45PM +0200, Ingo Molnar wrote:
> > > >
> > > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > > >
> > > > > > to make sure it's not some effect in -rt causing this. v17 has an
> > > > > > updated load balancing code. (which might or might not affect the
> > > > > > rcutorture problem.)
> > > > >
> > > > > Good point! I will try the following:
> > > > >
> > > > > 1. Stock 2.6.21.5.
> > > > >
> > > > > 2. 2.6.21-rt14.
> > > > >
> > > > > 3. 2.6.21.5 + sched-cfs-v2.6.21.5-v17.patch
> > > > >
> > > > > And quickly, before everyone else jumps on the machines that show the
> > > > > problem. ;-)
> > > >
> > > > thanks! It's enough to check whether modprobe rcutorture still produces
> > > > that weird balancing problem. That clearly has to be fixed ...
> > > >
> > > > And i've Cc:-ed Dmitry and Srivatsa, who are busy hacking this area of
> > > > the CFS code as we speak :-)
> > >
> > > Well, I am not sure that the info I was able to collect will be all
> > > that helpful, but it most certainly does confirm that the balancing
> > > problem that rcutorture produces is indeed weird...
> >
> > Hi Paul,
> > I tried on two machines in our lab and could not recreate your
> > problem.
> >
> > On a 2way x86_64 AMD box and 2.6.21.5+cfsv17:
> >
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 12395 root 39 19 0 0 0 R 50.3 0.0 0:57.62 rcu_torture_rea
> > 12394 root 39 19 0 0 0 R 49.9 0.0 0:57.29 rcu_torture_rea
> > 12396 root 39 19 0 0 0 R 49.9 0.0 0:56.96 rcu_torture_rea
> > 12397 root 39 19 0 0 0 R 49.9 0.0 0:56.90 rcu_torture_rea
> >
> > On a 4way x86_64 Intel Xeon box and 2.6.21.5+cfsv17:
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
> > 6258 root 39 19 0 0 0 R 53 0.0 17:29.72 0 rcu_torture_rea
> > 6252 root 39 19 0 0 0 R 49 0.0 17:49.40 3 rcu_torture_rea
> > 6257 root 39 19 0 0 0 R 49 0.0 17:22.49 2 rcu_torture_rea
> > 6256 root 39 19 0 0 0 R 48 0.0 17:50.12 1 rcu_torture_rea
> > 6254 root 39 19 0 0 0 R 48 0.0 17:26.98 0 rcu_torture_rea
> > 6255 root 39 19 0 0 0 R 48 0.0 17:25.74 2 rcu_torture_rea
> > 6251 root 39 19 0 0 0 R 45 0.0 17:47.45 3 rcu_torture_rea
> > 6253 root 39 19 0 0 0 R 45 0.0 17:48.48 1 rcu_torture_rea
> >
> >
> > I will try this on few more boxes we have on Monday. If I can't recreate, then
> > I may request you to provide me machine details (or even access to the problem
> > box if it is in IBM labs and if I am allowed to login!)
>
> elm3b6, ABAT job 95107. There are others, this particular job uses
> 2.6.21.5-cfsv17.
Paul,
I logged into elm3b6 and did some investigation. I think I have
a tentative patch to fix your load-balance problem.
First, an explanation of the problem:
This particular machine, elm3b6, is a 4-cpu, (gasp, yes!) 4-node box i.e
each CPU is a node by itself. If you don't have CONFIG_NUMA enabled,
then we won't have cross-node (i.e cross-cpu) load balancing.
Fortunately in your case you had CONFIG_NUMA enabled, but still were
hitting the (gross) load imbalance.
The problem seems to be with idle_balance(). This particular routine,
invoked by schedule() on a idle cpu, walks up sched-domain hierarchy and
tries to balance in each domain that has SD_BALANCE_NEWIDLE flag set.
The nodes-level domain (SD_NODE_INIT) however doesn't set this flag,
which means idle cpu looks for (im)balance within its own node at most and
not beyond. Now, here's the problem, if the idle cpu doesn't find
imbalance within its node (pulled_tasks = 0), it resets this_rq->next_balance
so that next balancing activity is deferred for upto a minute
(next_balance = jiffies + 60 * HZ). If a idle cpu calls idle_balance
again in the next minute and finds no imbalance within its node, it
-again- resets next_balance. In your case, I think this was happening
repetetively, which made other CPUs never look for cross-node
(im)balance.
I believe the patch below is correct. With the patch applied, I could
not recreate the imbalance with rcutorture. Let me know whether you
still see the problem with this patch applied on any other machine.
I have CCed others who have worked in this area and request them to review
this patch.
Andrew,
If there is no objection from anyone, request you to pick this
up for next -mm release. It has been tested against 2.6.22-rc4-mm2.
idle_balance() can erroneously cause system-wide imbalance to be overlooked
by reseting rq->next_balance. When called sufficient number of times, it
can forever defer system-wide load balance. Patch below modifies
idle_balance() not to mess with ->next_balance. If indeed it turns out
that there is no imbalance even system-wide, rebalance_domains() will
anyway set ->next_balance to happen after a minute.
Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Index: linux-2.6.22-rc4/kernel/sched.c
===================================================================
--- linux-2.6.22-rc4.orig/kernel/sched.c 2007-06-18 07:16:49.000000000 -0700
+++ linux-2.6.22-rc4/kernel/sched.c 2007-06-18 07:18:41.000000000 -0700
@@ -2490,27 +2490,16 @@
{
struct sched_domain *sd;
int pulled_task = 0;
- unsigned long next_balance = jiffies + 60 * HZ;
for_each_domain(this_cpu, sd) {
if (sd->flags & SD_BALANCE_NEWIDLE) {
/* If we've pulled tasks over stop searching: */
pulled_task = load_balance_newidle(this_cpu,
this_rq, sd);
- if (time_after(next_balance,
- sd->last_balance + sd->balance_interval))
- next_balance = sd->last_balance
- + sd->balance_interval;
if (pulled_task)
break;
}
}
- if (!pulled_task)
- /*
- * We are going idle. next_balance may be set based on
- * a busy processor. So reset next_balance.
- */
- this_rq->next_balance = next_balance;
}
/*
--
Regards,
vatsa
next parent reply other threads:[~2007-06-18 15:04 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20070613180451.GA16628@elte.hu>
[not found] ` <20070613184741.GC8125@linux.vnet.ibm.com>
[not found] ` <20070613185522.GA27335@elte.hu>
[not found] ` <20070613233910.GJ8125@linux.vnet.ibm.com>
[not found] ` <20070615144535.GA12078@elte.hu>
[not found] ` <20070615151452.GC9301@linux.vnet.ibm.com>
[not found] ` <20070615195545.GA28872@elte.hu>
[not found] ` <20070616011605.GH9301@linux.vnet.ibm.com>
[not found] ` <20070616084434.GG2559@linux.vnet.ibm.com>
[not found] ` <20070616161213.GA2994@linux.vnet.ibm.com>
2007-06-18 15:12 ` Srivatsa Vaddagiri [this message]
2007-06-18 16:54 ` v2.6.21.4-rt11 Christoph Lameter
2007-06-18 17:35 ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-18 17:59 ` v2.6.21.4-rt11 Christoph Lameter
2007-06-19 1:52 ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19 2:13 ` v2.6.21.4-rt11 Siddha, Suresh B
2007-06-19 2:15 ` v2.6.21.4-rt11 Siddha, Suresh B
2007-06-19 3:46 ` v2.6.21.4-rt11 Christoph Lameter
2007-06-19 5:49 ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19 8:07 ` v2.6.21.4-rt11 Ingo Molnar
2007-06-18 18:06 ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19 9:04 ` v2.6.21.4-rt11 Ingo Molnar
2007-06-19 10:43 ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19 14:33 ` v2.6.21.4-rt11 Srivatsa Vaddagiri
2007-06-19 19:15 ` v2.6.21.4-rt11 Christoph Lameter
2007-06-19 15:08 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-19 19:14 ` v2.6.21.4-rt11 Christoph Lameter
2007-06-10 17:50 v2.6.21.4-rt11 Miguel Botón
-- strict thread matches above, loose matches on Subject: below --
2007-06-09 21:05 v2.6.21.4-rt11 Ingo Molnar
2007-06-11 1:19 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 7:36 ` v2.6.21.4-rt11 Ingo Molnar
2007-06-11 14:44 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 15:38 ` v2.6.21.4-rt11 Ingo Molnar
2007-06-11 15:55 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 17:18 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 20:44 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-11 22:18 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-12 21:37 ` v2.6.21.4-rt11 Ingo Molnar
2007-06-13 1:27 ` v2.6.21.4-rt11 Paul E. McKenney
2007-06-12 6:03 ` v2.6.21.4-rt11 Eric St-Laurent
2007-06-12 7:32 ` v2.6.21.4-rt11 Ingo Molnar
2007-06-12 13:00 ` v2.6.21.4-rt11 Pallipadi, Venkatesh
2007-06-13 1:37 ` v2.6.21.4-rt11 Eric St-Laurent
2007-06-17 16:15 ` v2.6.21.4-rt11 Nelson Castillo
2007-06-17 16:43 ` v2.6.21.4-rt11 Thomas Gleixner
2007-06-17 16:49 ` v2.6.21.4-rt11 Nelson Castillo
2007-06-17 16:59 ` v2.6.21.4-rt11 Thomas Gleixner
2007-06-18 16:14 ` v2.6.21.4-rt11 Katsuya MATSUBARA
2007-06-19 4:04 ` v2.6.21.4-rt11 Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070618151215.GA9750@linux.vnet.ibm.com \
--to=vatsa@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=dino@in.ibm.com \
--cc=dmitry.adamushko@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=pwil3058@bigpond.net.au \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox