From: Li Zefan <lizf@cn.fujitsu.com>
To: Max Krasnyansky <maxk@qualcomm.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>,
Gregory Haskins <ghaskins@novell.com>,
Derek Fults <dfults@sgi.com>,
Peter Zijlstra <peterz@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: RT sched: cpupri_vec lock contention with def_root_domain and no load balance
Date: Sat, 22 Nov 2008 16:18:29 +0800 [thread overview]
Message-ID: <4927C055.8030009@cn.fujitsu.com> (raw)
In-Reply-To: <4927AECA.2040707@qualcomm.com>
Max Krasnyansky wrote:
>
> Dimitri Sivanich wrote:
>> Hi Greg and Max,
>>
>> On Fri, Nov 21, 2008 at 12:04:25PM -0800, Max Krasnyansky wrote:
>>> Hi Greg,
>>>
>>> I attached debug instrumentation patch for Dmitri to try. I'll clean it up and
>>> add things you requested and will resubmit properly some time next week.
>>>
>> We added Max's debug patch to our kernel and have run Max's Trace 3 scenario, but we do not see a NULL sched-domain remain attached, see my comments below.
>>
>>
>> mount -t cgroup cpuset -ocpuset /cpusets/
>>
>> for i in 0 1 2 3; do mkdir par$i; echo $i > par$i/cpuset.cpus; done
>>
>> kernel: cpusets: rebuild ndoms 1
>> kernel: cpuset: domain 0 cpumask
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>> 0000000,00000000,00000000,00000000,0
> Oops. I did not realize your NR_CPUS is so large. Unfortunately all your masks
> got truncated.
> I'll update the patch to print cpu list instead of the masks.
>
>> echo 0 > cpuset.sched_load_balance
>> kernel: cpusets: rebuild ndoms 4
>> kernel: cpuset: domain 0 cpumask
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>> 0000000,00000000,00000000,00000000,0
>> kernel: cpuset: domain 1 cpumask
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>> 0000000,00000000,00000000,00000000,0
>> kernel: cpuset: domain 2 cpumask
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>> 0000000,00000000,00000000,00000000,0
>> kernel: cpuset: domain 3 cpumask
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>> 0000000,00000000,00000000,00000000,0
>> kernel: CPU0 root domain default
>> kernel: CPU0 attaching NULL sched-domain.
>> kernel: CPU1 root domain default
>> kernel: CPU1 attaching NULL sched-domain.
>> kernel: CPU2 root domain default
>> kernel: CPU2 attaching NULL sched-domain.
>> kernel: CPU3 root domain default
>> kernel: CPU3 attaching NULL sched-domain.
>
>> kernel: CPU3 root domain e0000069ecb20000
>> kernel: CPU3 attaching sched-domain:
>> kernel: domain 0: span 3 level NODE
>> kernel: groups: 3
>> kernel: CPU2 root domain e000006884a00000
>> kernel: CPU2 attaching sched-domain:
>> kernel: domain 0: span 2 level NODE
>> kernel: groups: 2
>> kernel: CPU1 root domain e000006884a20000
>> kernel: CPU1 attaching sched-domain:
>> kernel: domain 0: span 1 level NODE
>> kernel: groups: 1
>> kernel: CPU0 root domain e000006884a40000
>> kernel: CPU0 attaching sched-domain:
>> kernel: domain 0: span 0 level NODE
>> kernel: groups: 0
>>
>> Which is the way sched_load_balance is supposed to work. You need to set
>> sched_load_balance=0 for all cpusets containing any cpu you want to disable
>> balancing on, otherwise some balancing will happen.
> It won't be much of a balancing in this case because this just one cpu per
> domain.
> In other words no that's not how it supposed to work. There is code in
> cpu_attach_domain() that is supposed to remove redundant levels
> (sd_degenerate() stuff). There is an explicit check in there for numcpus == 1.
> btw The reason you got a different result that I did is because you have a
> NUMA box where is mine is UMA. I was able to reproduce the problem though by
> enabling multi-core scheduler. In which case I also get one redundant domain
> level CPU, with a single CPU in it.
> So we definitely need to fix this. I'll try to poke around tomorrow and figure
> out why redundant level is not dropped.
>
You were not using latest kernel, were you?
There was a bug in sd degenerate code, and it has already been fixed:
http://lkml.org/lkml/2008/11/8/10
>> So in addition to the top (root) cpuset, we need to set it to '0' in the
>> parX cpusets. That will turn off load balancing to the cpus in question
>> (thereby attaching a NULL sched domain).
> As I explained above we should not have to disable load balancing in cpusets
> with a single CPU.
>
Yes, and please try the laste kernel. ;)
>> So when we do that for just par3, we get the following:
>> echo 0 > par3/cpuset.sched_load_balance
>> kernel: cpusets: rebuild ndoms 3
>> kernel: cpuset: domain 0 cpumask
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>> 0000000,00000000,00000000,00000000,0
>> kernel: cpuset: domain 1 cpumask
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>> 0000000,00000000,00000000,00000000,0
>> kernel: cpuset: domain 2 cpumask
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>> 0000000,00000000,00000000,00000000,0
>> kernel: CPU3 root domain default
>> kernel: CPU3 attaching NULL sched-domain.
>>
>> So the def_root_domain is now attached for CPU 3. And we do have a NULL
>> sched-domain, which we expect for a cpu with load balancing turned off. If
>> we turn sched_load_balance off ('0') on each of the other cpusets (par0-2),
>> each of those cpus would also have a NULL sched-domain attached.
> Ok. This one is a bug in cpuset.c:generate_sched_domains(). Sched domain
> generator in cpusets should not drop domains with single cpu in them when
> sched_load_balance==0. I'll look at that tomorrow too.
>
Do you mean the correct behavior should be as following?
kernel: cpusets: rebuild ndoms 4
But why do you think this is a bug? In generate_sched_domains(), cpusets with
sched_load_balance==0 will be skippped:
list_add(&top_cpuset.stack_list, &q);
while (!list_empty(&q)) {
...
if (is_sched_load_balance(cp)) {
csa[csn++] = cp;
continue;
}
...
}
Correct me if I misunderstood your point.
next prev parent reply other threads:[~2008-11-22 8:18 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-03 21:07 RT sched: cpupri_vec lock contention with def_root_domain and no load balance Dimitri Sivanich
2008-11-03 22:33 ` Peter Zijlstra
2008-11-04 1:29 ` Dimitri Sivanich
2008-11-04 3:53 ` Gregory Haskins
2008-11-04 14:34 ` Gregory Haskins
2008-11-04 14:36 ` Peter Zijlstra
2008-11-04 14:40 ` Dimitri Sivanich
2008-11-04 14:59 ` Gregory Haskins
2008-11-19 19:49 ` Max Krasnyansky
2008-11-19 19:55 ` Dimitri Sivanich
2008-11-19 20:17 ` Max Krasnyansky
2008-11-19 20:21 ` Dimitri Sivanich
2008-11-19 20:25 ` Gregory Haskins
2008-11-19 20:33 ` Dimitri Sivanich
2008-11-19 21:30 ` Gregory Haskins
2008-11-19 21:47 ` Dimitri Sivanich
2008-11-19 22:25 ` Gregory Haskins
2008-11-20 2:12 ` Max Krasnyansky
2008-11-21 1:57 ` Gregory Haskins
2008-11-21 20:04 ` Max Krasnyansky
2008-11-21 21:18 ` Dimitri Sivanich
2008-11-22 7:03 ` Max Krasnyansky
2008-11-22 8:18 ` Li Zefan [this message]
2008-11-24 15:11 ` Dimitri Sivanich
2008-11-24 21:47 ` Max Krasnyansky
2008-11-24 21:46 ` Max Krasnyansky
2008-11-04 14:45 ` Dimitri Sivanich
2008-11-06 9:13 ` Nish Aravamudan
2008-11-06 13:32 ` Dimitri Sivanich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4927C055.8030009@cn.fujitsu.com \
--to=lizf@cn.fujitsu.com \
--cc=dfults@sgi.com \
--cc=ghaskins@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maxk@qualcomm.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=sivanich@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox