From: Max Krasnyansky <maxk@qualcomm.com>
To: Dimitri Sivanich <sivanich@sgi.com>
Cc: Gregory Haskins <ghaskins@novell.com>,
Derek Fults <dfults@sgi.com>,
Peter Zijlstra <peterz@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: RT sched: cpupri_vec lock contention with def_root_domain and no load balance
Date: Fri, 21 Nov 2008 23:03:38 -0800 [thread overview]
Message-ID: <4927AECA.2040707@qualcomm.com> (raw)
In-Reply-To: <20081121211800.GA16647@sgi.com>
Dimitri Sivanich wrote:
> Hi Greg and Max,
>
> On Fri, Nov 21, 2008 at 12:04:25PM -0800, Max Krasnyansky wrote:
>> Hi Greg,
>>
>> I attached debug instrumentation patch for Dmitri to try. I'll clean it up and
>> add things you requested and will resubmit properly some time next week.
>>
>
> We added Max's debug patch to our kernel and have run Max's Trace 3 scenario, but we do not see a NULL sched-domain remain attached, see my comments below.
>
>
> mount -t cgroup cpuset -ocpuset /cpusets/
>
> for i in 0 1 2 3; do mkdir par$i; echo $i > par$i/cpuset.cpus; done
>
> kernel: cpusets: rebuild ndoms 1
> kernel: cpuset: domain 0 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
Oops. I did not realize your NR_CPUS is so large. Unfortunately all your masks
got truncated.
I'll update the patch to print cpu list instead of the masks.
> echo 0 > cpuset.sched_load_balance
> kernel: cpusets: rebuild ndoms 4
> kernel: cpuset: domain 0 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 1 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 2 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 3 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: CPU0 root domain default
> kernel: CPU0 attaching NULL sched-domain.
> kernel: CPU1 root domain default
> kernel: CPU1 attaching NULL sched-domain.
> kernel: CPU2 root domain default
> kernel: CPU2 attaching NULL sched-domain.
> kernel: CPU3 root domain default
> kernel: CPU3 attaching NULL sched-domain.
> kernel: CPU3 root domain e0000069ecb20000
> kernel: CPU3 attaching sched-domain:
> kernel: domain 0: span 3 level NODE
> kernel: groups: 3
> kernel: CPU2 root domain e000006884a00000
> kernel: CPU2 attaching sched-domain:
> kernel: domain 0: span 2 level NODE
> kernel: groups: 2
> kernel: CPU1 root domain e000006884a20000
> kernel: CPU1 attaching sched-domain:
> kernel: domain 0: span 1 level NODE
> kernel: groups: 1
> kernel: CPU0 root domain e000006884a40000
> kernel: CPU0 attaching sched-domain:
> kernel: domain 0: span 0 level NODE
> kernel: groups: 0
>
> Which is the way sched_load_balance is supposed to work. You need to set
> sched_load_balance=0 for all cpusets containing any cpu you want to disable
> balancing on, otherwise some balancing will happen.
It won't be much of a balancing in this case because this just one cpu per
domain.
In other words no that's not how it supposed to work. There is code in
cpu_attach_domain() that is supposed to remove redundant levels
(sd_degenerate() stuff). There is an explicit check in there for numcpus == 1.
btw The reason you got a different result that I did is because you have a
NUMA box where is mine is UMA. I was able to reproduce the problem though by
enabling multi-core scheduler. In which case I also get one redundant domain
level CPU, with a single CPU in it.
So we definitely need to fix this. I'll try to poke around tomorrow and figure
out why redundant level is not dropped.
> So in addition to the top (root) cpuset, we need to set it to '0' in the
> parX cpusets. That will turn off load balancing to the cpus in question
> (thereby attaching a NULL sched domain).
As I explained above we should not have to disable load balancing in cpusets
with a single CPU.
> So when we do that for just par3, we get the following:
> echo 0 > par3/cpuset.sched_load_balance
> kernel: cpusets: rebuild ndoms 3
> kernel: cpuset: domain 0 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 1 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: cpuset: domain 2 cpumask
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
> 0000000,00000000,00000000,00000000,0
> kernel: CPU3 root domain default
> kernel: CPU3 attaching NULL sched-domain.
>
> So the def_root_domain is now attached for CPU 3. And we do have a NULL
> sched-domain, which we expect for a cpu with load balancing turned off. If
> we turn sched_load_balance off ('0') on each of the other cpusets (par0-2),
> each of those cpus would also have a NULL sched-domain attached.
Ok. This one is a bug in cpuset.c:generate_sched_domains(). Sched domain
generator in cpusets should not drop domains with single cpu in them when
sched_load_balance==0. I'll look at that tomorrow too.
Max
next prev parent reply other threads:[~2008-11-22 7:03 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-03 21:07 RT sched: cpupri_vec lock contention with def_root_domain and no load balance Dimitri Sivanich
2008-11-03 22:33 ` Peter Zijlstra
2008-11-04 1:29 ` Dimitri Sivanich
2008-11-04 3:53 ` Gregory Haskins
2008-11-04 14:34 ` Gregory Haskins
2008-11-04 14:36 ` Peter Zijlstra
2008-11-04 14:40 ` Dimitri Sivanich
2008-11-04 14:59 ` Gregory Haskins
2008-11-19 19:49 ` Max Krasnyansky
2008-11-19 19:55 ` Dimitri Sivanich
2008-11-19 20:17 ` Max Krasnyansky
2008-11-19 20:21 ` Dimitri Sivanich
2008-11-19 20:25 ` Gregory Haskins
2008-11-19 20:33 ` Dimitri Sivanich
2008-11-19 21:30 ` Gregory Haskins
2008-11-19 21:47 ` Dimitri Sivanich
2008-11-19 22:25 ` Gregory Haskins
2008-11-20 2:12 ` Max Krasnyansky
2008-11-21 1:57 ` Gregory Haskins
2008-11-21 20:04 ` Max Krasnyansky
2008-11-21 21:18 ` Dimitri Sivanich
2008-11-22 7:03 ` Max Krasnyansky [this message]
2008-11-22 8:18 ` Li Zefan
2008-11-24 15:11 ` Dimitri Sivanich
2008-11-24 21:47 ` Max Krasnyansky
2008-11-24 21:46 ` Max Krasnyansky
2008-11-04 14:45 ` Dimitri Sivanich
2008-11-06 9:13 ` Nish Aravamudan
2008-11-06 13:32 ` Dimitri Sivanich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4927AECA.2040707@qualcomm.com \
--to=maxk@qualcomm.com \
--cc=dfults@sgi.com \
--cc=ghaskins@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=sivanich@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.