From: Gregory Haskins <ghaskins@novell.com>
To: Dimitri Sivanich <sivanich@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: RT sched: cpupri_vec lock contention with def_root_domain and no load balance
Date: Tue, 04 Nov 2008 09:59:24 -0500 [thread overview]
Message-ID: <4910634C.1020207@novell.com> (raw)
In-Reply-To: <20081104144017.GB30855@sgi.com>
[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]
Dimitri Sivanich wrote:
> On Tue, Nov 04, 2008 at 03:36:33PM +0100, Peter Zijlstra wrote:
>
>> On Tue, 2008-11-04 at 09:34 -0500, Gregory Haskins wrote:
>>
>>> Gregory Haskins wrote:
>>>
>>>> Peter Zijlstra wrote:
>>>>
>>>>
>>>>> On Mon, 2008-11-03 at 15:07 -0600, Dimitri Sivanich wrote:
>>>>>
>>>>>
>>>>>
>>>>>> When load balancing gets switched off for a set of cpus via the
>>>>>> sched_load_balance flag in cpusets, those cpus wind up with the
>>>>>> globally defined def_root_domain attached. The def_root_domain is
>>>>>> attached when partition_sched_domains calls detach_destroy_domains().
>>>>>> A new root_domain is never allocated or attached as a sched domain
>>>>>> will never be attached by __build_sched_domains() for the non-load
>>>>>> balanced processors.
>>>>>>
>>>>>> The problem with this scenario is that on systems with a large number
>>>>>> of processors with load balancing switched off, we start to see the
>>>>>> cpupri->pri_to_cpu->lock in the def_root_domain becoming contended.
>>>>>> This starts to become much more apparent above 8 waking RT threads
>>>>>> (with each RT thread running on it's own cpu, blocking and waking up
>>>>>> continuously).
>>>>>>
>>>>>> I'm wondering if this is, in fact, the way things were meant to work,
>>>>>> or should we have a root domain allocated for each cpu that is not to
>>>>>> be part of a sched domain? Note the the def_root_domain spans all of
>>>>>> the non-load-balanced cpus in this case. Having it attached to cpus
>>>>>> that should not be load balancing doesn't quite make sense to me.
>>>>>>
>>>>>>
>>>>>>
>>>>> It shouldn't be like that, each load-balance domain (in your case a
>>>>> single cpu) should get its own root domain. Gregory?
>>>>>
>>>>>
>>>>>
>>>> Yeah, this sounds broken. I know that the root-domain code was being
>>>> developed coincident to some upheaval with the cpuset code, so I suspect
>>>> something may have been broken from the original intent. I will take a
>>>> look.
>>>>
>>>> -Greg
>>>>
>>>>
>>>>
>>> After thinking about it some more, I am not quite sure what to do here.
>>> The root-domain code was really designed to be 1:1 with a disjoint
>>> cpuset. In this case, it sounds like all the non-balanced cpus are
>>> still in one default cpuset. In that case, the code is correct to place
>>> all those cores in the singleton def_root_domain. The question really
>>> is: How do we support the sched_load_balance flag better?
>>>
>>> I suppose we could go through the scheduler code and have it check that
>>> flag before consulting the root-domain. Another alternative is to have
>>> the sched_load_balance=false flag create a disjoint cpuset. Any thoughts?
>>>
>> Hmm, but you cannot disable load-balance on a cpu without placing it in
>> an cpuset first, right?
>>
>> Or are folks disabling load-balance bottom-up, instead of top-down?
>>
>> In that case, I think we should dis-allow that.
>>
>
> When I see this behavior, I am creating cpusets containing these non load balancing cpus. Whether I create a single cpuset for each one, or one cpuset for all of them, the root domain ends up being the def_root_domain with no sched domain attached once I set both the root cpuset and created cpuset's sched_load_balance flags to 0.
>
>
If you tried creating different cpusets and it still had them all end up
in the def_root_domain, something is very broken indeed. I will take a
look.
-Greg
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
next prev parent reply other threads:[~2008-11-04 14:58 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-03 21:07 RT sched: cpupri_vec lock contention with def_root_domain and no load balance Dimitri Sivanich
2008-11-03 22:33 ` Peter Zijlstra
2008-11-04 1:29 ` Dimitri Sivanich
2008-11-04 3:53 ` Gregory Haskins
2008-11-04 14:34 ` Gregory Haskins
2008-11-04 14:36 ` Peter Zijlstra
2008-11-04 14:40 ` Dimitri Sivanich
2008-11-04 14:59 ` Gregory Haskins [this message]
2008-11-19 19:49 ` Max Krasnyansky
2008-11-19 19:55 ` Dimitri Sivanich
2008-11-19 20:17 ` Max Krasnyansky
2008-11-19 20:21 ` Dimitri Sivanich
2008-11-19 20:25 ` Gregory Haskins
2008-11-19 20:33 ` Dimitri Sivanich
2008-11-19 21:30 ` Gregory Haskins
2008-11-19 21:47 ` Dimitri Sivanich
2008-11-19 22:25 ` Gregory Haskins
2008-11-20 2:12 ` Max Krasnyansky
2008-11-21 1:57 ` Gregory Haskins
2008-11-21 20:04 ` Max Krasnyansky
2008-11-21 21:18 ` Dimitri Sivanich
2008-11-22 7:03 ` Max Krasnyansky
2008-11-22 8:18 ` Li Zefan
2008-11-24 15:11 ` Dimitri Sivanich
2008-11-24 21:47 ` Max Krasnyansky
2008-11-24 21:46 ` Max Krasnyansky
2008-11-04 14:45 ` Dimitri Sivanich
2008-11-06 9:13 ` Nish Aravamudan
2008-11-06 13:32 ` Dimitri Sivanich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4910634C.1020207@novell.com \
--to=ghaskins@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=sivanich@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox