From: Max Krasnyansky <maxk@qualcomm.com>
To: Li Zefan <lizf@cn.fujitsu.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>,
Gregory Haskins <ghaskins@novell.com>,
Derek Fults <dfults@sgi.com>,
Peter Zijlstra <peterz@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: RT sched: cpupri_vec lock contention with def_root_domain and no load balance
Date: Mon, 24 Nov 2008 13:46:14 -0800 [thread overview]
Message-ID: <492B20A6.8050905@qualcomm.com> (raw)
In-Reply-To: <4927C055.8030009@cn.fujitsu.com>
Li Zefan wrote:
> Max Krasnyansky wrote:
>> Dimitri Sivanich wrote:
>>> kernel: CPU3 root domain e0000069ecb20000
>>> kernel: CPU3 attaching sched-domain:
>>> kernel: domain 0: span 3 level NODE
>>> kernel: groups: 3
>>> kernel: CPU2 root domain e000006884a00000
>>> kernel: CPU2 attaching sched-domain:
>>> kernel: domain 0: span 2 level NODE
>>> kernel: groups: 2
>>> kernel: CPU1 root domain e000006884a20000
>>> kernel: CPU1 attaching sched-domain:
>>> kernel: domain 0: span 1 level NODE
>>> kernel: groups: 1
>>> kernel: CPU0 root domain e000006884a40000
>>> kernel: CPU0 attaching sched-domain:
>>> kernel: domain 0: span 0 level NODE
>>> kernel: groups: 0
>>>
>>> Which is the way sched_load_balance is supposed to work. You need to set
>>> sched_load_balance=0 for all cpusets containing any cpu you want to disable
>>> balancing on, otherwise some balancing will happen.
>> It won't be much of a balancing in this case because this just one cpu per
>> domain.
>> In other words no that's not how it supposed to work. There is code in
>> cpu_attach_domain() that is supposed to remove redundant levels
>> (sd_degenerate() stuff). There is an explicit check in there for numcpus == 1.
>> btw The reason you got a different result that I did is because you have a
>> NUMA box where is mine is UMA. I was able to reproduce the problem though by
>> enabling multi-core scheduler. In which case I also get one redundant domain
>> level CPU, with a single CPU in it.
>> So we definitely need to fix this. I'll try to poke around tomorrow and figure
>> out why redundant level is not dropped.
>>
>
> You were not using latest kernel, were you?
>
> There was a bug in sd degenerate code, and it has already been fixed:
> http://lkml.org/lkml/2008/11/8/10
Ah, makes sense.
The funny part is that I did see the patch before but completely forgot
about it :).
>>> So when we do that for just par3, we get the following:
>>> echo 0 > par3/cpuset.sched_load_balance
>>> kernel: cpusets: rebuild ndoms 3
>>> kernel: cpuset: domain 0 cpumask
>>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>>> 0000000,00000000,00000000,00000000,0
>>> kernel: cpuset: domain 1 cpumask
>>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>>> 0000000,00000000,00000000,00000000,0
>>> kernel: cpuset: domain 2 cpumask
>>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
>>> 0000000,00000000,00000000,00000000,0
>>> kernel: CPU3 root domain default
>>> kernel: CPU3 attaching NULL sched-domain.
>>>
>>> So the def_root_domain is now attached for CPU 3. And we do have a NULL
>>> sched-domain, which we expect for a cpu with load balancing turned off. If
>>> we turn sched_load_balance off ('0') on each of the other cpusets (par0-2),
>>> each of those cpus would also have a NULL sched-domain attached.
>> Ok. This one is a bug in cpuset.c:generate_sched_domains(). Sched domain
>> generator in cpusets should not drop domains with single cpu in them when
>> sched_load_balance==0. I'll look at that tomorrow too.
>>
>
> Do you mean the correct behavior should be as following?
> kernel: cpusets: rebuild ndoms 4
Yes.
> But why do you think this is a bug? In generate_sched_domains(), cpusets with
> sched_load_balance==0 will be skippped:
>
> list_add(&top_cpuset.stack_list, &q);
> while (!list_empty(&q)) {
> ...
> if (is_sched_load_balance(cp)) {
> csa[csn++] = cp;
> continue;
> }
> ...
> }
>
> Correct me if I misunderstood your point.
The problem is that all cpus in cpusets with sched_load_balance==0 end
up in the default root_domain which causes lock contention.
We can fix it either in sched.c:partition_sched_domains() or in
cpusets.c:generate_sched_domains(). I'd rather fix cpusets because
sched.c fix will be sub-optimal. See my answer to Greg on the same
thread. Basically the scheduler code would have to allocate a
root_domain for each CPU even on transitional states. So I'd rather fix
cpusets to generate domain for each non-overlapping cpuset regardless of
the sched_load_balance flag.
Max
next prev parent reply other threads:[~2008-11-24 21:46 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-03 21:07 RT sched: cpupri_vec lock contention with def_root_domain and no load balance Dimitri Sivanich
2008-11-03 22:33 ` Peter Zijlstra
2008-11-04 1:29 ` Dimitri Sivanich
2008-11-04 3:53 ` Gregory Haskins
2008-11-04 14:34 ` Gregory Haskins
2008-11-04 14:36 ` Peter Zijlstra
2008-11-04 14:40 ` Dimitri Sivanich
2008-11-04 14:59 ` Gregory Haskins
2008-11-19 19:49 ` Max Krasnyansky
2008-11-19 19:55 ` Dimitri Sivanich
2008-11-19 20:17 ` Max Krasnyansky
2008-11-19 20:21 ` Dimitri Sivanich
2008-11-19 20:25 ` Gregory Haskins
2008-11-19 20:33 ` Dimitri Sivanich
2008-11-19 21:30 ` Gregory Haskins
2008-11-19 21:47 ` Dimitri Sivanich
2008-11-19 22:25 ` Gregory Haskins
2008-11-20 2:12 ` Max Krasnyansky
2008-11-21 1:57 ` Gregory Haskins
2008-11-21 20:04 ` Max Krasnyansky
2008-11-21 21:18 ` Dimitri Sivanich
2008-11-22 7:03 ` Max Krasnyansky
2008-11-22 8:18 ` Li Zefan
2008-11-24 15:11 ` Dimitri Sivanich
2008-11-24 21:47 ` Max Krasnyansky
2008-11-24 21:46 ` Max Krasnyansky [this message]
2008-11-04 14:45 ` Dimitri Sivanich
2008-11-06 9:13 ` Nish Aravamudan
2008-11-06 13:32 ` Dimitri Sivanich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=492B20A6.8050905@qualcomm.com \
--to=maxk@qualcomm.com \
--cc=dfults@sgi.com \
--cc=ghaskins@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=sivanich@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.