public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Max Krasnyansky <maxk@qualcomm.com>
To: Dimitri Sivanich <sivanich@sgi.com>
Cc: Gregory Haskins <ghaskins@novell.com>,
	Derek Fults <dfults@sgi.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: RT sched: cpupri_vec lock contention with def_root_domain and no load balance
Date: Fri, 21 Nov 2008 23:03:38 -0800	[thread overview]
Message-ID: <4927AECA.2040707@qualcomm.com> (raw)
In-Reply-To: <20081121211800.GA16647@sgi.com>



Dimitri Sivanich wrote:
> Hi Greg and Max,
> 
> On Fri, Nov 21, 2008 at 12:04:25PM -0800, Max Krasnyansky wrote:
>> Hi Greg,
>>
>> I attached debug instrumentation patch for Dmitri to try. I'll clean it up and
>>  add things you requested and will resubmit properly some time next week.
>>
> 
> We added Max's debug patch to our kernel and have run Max's Trace 3 scenario, but we do not see a NULL sched-domain remain attached, see my comments below.
> 
> 
> mount -t cgroup cpuset -ocpuset /cpusets/
> 
> for i in 0 1 2 3; do mkdir par$i; echo $i > par$i/cpuset.cpus; done
> 
> kernel: cpusets: rebuild ndoms 1
> kernel: cpuset: domain 0 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
Oops. I did not realize your NR_CPUS is so large. Unfortunately all your masks
got truncated.
I'll update the patch to print cpu list instead of the masks.

> echo 0 > cpuset.sched_load_balance
> kernel: cpusets: rebuild ndoms 4
> kernel: cpuset: domain 0 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 1 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 2 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 3 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: CPU0 root domain default
> kernel: CPU0 attaching NULL sched-domain.
> kernel: CPU1 root domain default
> kernel: CPU1 attaching NULL sched-domain.
> kernel: CPU2 root domain default
> kernel: CPU2 attaching NULL sched-domain.
> kernel: CPU3 root domain default
> kernel: CPU3 attaching NULL sched-domain.

> kernel: CPU3 root domain e0000069ecb20000
> kernel: CPU3 attaching sched-domain:
> kernel:  domain 0: span 3 level NODE
> kernel:   groups: 3
> kernel: CPU2 root domain e000006884a00000
> kernel: CPU2 attaching sched-domain:
> kernel:  domain 0: span 2 level NODE
> kernel:   groups: 2
> kernel: CPU1 root domain e000006884a20000
> kernel: CPU1 attaching sched-domain:
> kernel:  domain 0: span 1 level NODE
> kernel:   groups: 1
> kernel: CPU0 root domain e000006884a40000
> kernel: CPU0 attaching sched-domain:
> kernel:  domain 0: span 0 level NODE
> kernel:   groups: 0
> 
> Which is the way sched_load_balance is supposed to work. You need to set
> sched_load_balance=0 for all cpusets containing any cpu you want to disable
> balancing on, otherwise some balancing will happen.
It won't be much of a balancing in this case because this just one cpu per
domain.
In other words no that's not how it supposed to work. There is code in
cpu_attach_domain() that is supposed to remove redundant levels
(sd_degenerate() stuff). There is an explicit check in there for numcpus == 1.
btw The reason you got a different result that I did is because you have a
NUMA box where is mine is UMA. I was able to reproduce the problem though by
enabling multi-core scheduler. In which case I also get one redundant domain
level CPU, with a single CPU in it.
So we definitely need to fix this. I'll try to poke around tomorrow and figure
out why redundant level is not dropped.

> So in addition to the top (root) cpuset, we need to set it to '0' in the
> parX cpusets. That will turn off load balancing to the cpus in question
> (thereby attaching a NULL sched domain). 
As I explained above we should not have to disable load balancing in cpusets
with a single CPU.

> So when we do that for just par3, we get the following:
> echo 0 > par3/cpuset.sched_load_balance
> kernel: cpusets: rebuild ndoms 3
> kernel: cpuset: domain 0 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 1 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 2 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: CPU3 root domain default
> kernel: CPU3 attaching NULL sched-domain.
> 
> So the def_root_domain is now attached for CPU 3.  And we do have a NULL
> sched-domain, which we expect for a cpu with load balancing turned off.  If
> we turn sched_load_balance off ('0') on each of the other cpusets (par0-2),
> each of those cpus would also have a NULL sched-domain attached.
Ok. This one is a bug in cpuset.c:generate_sched_domains(). Sched domain
generator in cpusets should not drop domains with single cpu in them when
sched_load_balance==0. I'll look at that tomorrow too.

Max

  reply	other threads:[~2008-11-22  7:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-03 21:07 RT sched: cpupri_vec lock contention with def_root_domain and no load balance Dimitri Sivanich
2008-11-03 22:33 ` Peter Zijlstra
2008-11-04  1:29   ` Dimitri Sivanich
2008-11-04  3:53   ` Gregory Haskins
2008-11-04 14:34     ` Gregory Haskins
2008-11-04 14:36       ` Peter Zijlstra
2008-11-04 14:40         ` Dimitri Sivanich
2008-11-04 14:59           ` Gregory Haskins
2008-11-19 19:49             ` Max Krasnyansky
2008-11-19 19:55               ` Dimitri Sivanich
2008-11-19 20:17                 ` Max Krasnyansky
2008-11-19 20:21                   ` Dimitri Sivanich
2008-11-19 20:25               ` Gregory Haskins
2008-11-19 20:33                 ` Dimitri Sivanich
2008-11-19 21:30                   ` Gregory Haskins
2008-11-19 21:47                     ` Dimitri Sivanich
2008-11-19 22:25                   ` Gregory Haskins
2008-11-20  2:12                 ` Max Krasnyansky
2008-11-21  1:57                   ` Gregory Haskins
2008-11-21 20:04                     ` Max Krasnyansky
2008-11-21 21:18                       ` Dimitri Sivanich
2008-11-22  7:03                         ` Max Krasnyansky [this message]
2008-11-22  8:18                           ` Li Zefan
2008-11-24 15:11                             ` Dimitri Sivanich
2008-11-24 21:47                               ` Max Krasnyansky
2008-11-24 21:46                             ` Max Krasnyansky
2008-11-04 14:45         ` Dimitri Sivanich
2008-11-06  9:13         ` Nish Aravamudan
2008-11-06 13:32           ` Dimitri Sivanich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4927AECA.2040707@qualcomm.com \
    --to=maxk@qualcomm.com \
    --cc=dfults@sgi.com \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=sivanich@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox