public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <piggin@cyberone.com.au>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Anton Blanchard <anton@samba.org>, Ingo Molnar <mingo@redhat.com>,
	"Martin J. Bligh" <mbligh@aracnet.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	Mark Wong <markw@osdl.org>
Subject: Re: [CFT][RFC] HT scheduler
Date: Sat, 13 Dec 2003 17:43:35 +1100	[thread overview]
Message-ID: <3FDAB517.4000309@cyberone.com.au> (raw)
In-Reply-To: <20031213022038.300B22C2C1@lists.samba.org>



Rusty Russell wrote:

>In message <3FD9679A.1020404@cyberone.com.au> you write:
>
>>Thanks for having a look Rusty. I'll try to convince you :)
>>
>>As you know, the domain classes is not just for HT, but can do multi levels
>>of NUMA, and it can be built by architecture specific code which is good
>>for Opteron, for example. It doesn't need CONFIG_SCHED_SMT either, of 
>>course,
>>or CONFIG_NUMA even: degenerate domains can just be collapsed (code isn't
>>there to do that now).
>>
>
>Yes, but this isn't what we really want.  I'm actually accusing you of
>lacking ambition 8)
>
>
>>Shared runqueues I find isn't so flexible. I think it perfectly describes
>>the P4 HT architecture, but what happens if (when) siblings get seperate
>>L1 caches? What about SMT, CMP, SMP and NUMA levels in the POWER5?
>>
>
>It describes every HyperThread implementation I am aware of today, so
>it suits us fine for the moment.  Runqueues may still be worth sharing
>even if L1 isn't, for example.
>

Possibly. But it restricts your load balancing to a specific case, it
eliminates any possibility of CPU affinity: 4 running threads on 1 HT
CPU for example, they'll ping pong from one cpu to the other happily.

I could get domains to do the same thing, but at the moment a CPU only looks
at its sibling's runqueue if they are unbalanced or is about to become idle.
I'm pretty sure domains can do anything shared runqueues can. I don't know
if you're disputing this or not?

>
>>The large SGI (and I imagine IBM's POWER5s) systems need things like
>>progressive balancing backoff and would probably benefit with a more
>>heirachical balancing scheme so all the balancing operations don't kill
>>the system.
>>
>
>But this is my point.  Scheduling is one part of the problem.  I want
>to be able to have the arch-specific code feed in a description of
>memory and cpu distances, bandwidths and whatever, and have the
>scheduler, slab allocator, per-cpu data allocation, page cache, page
>migrator and anything else which cares adjust itself based on that.
>
>Power 4 today has pairs of CPUs on a die, four dies on a board, and
>four boards in a machine.  I want one infrastructure to descibe it,
>not have to do program every infrastructure from arch-specific code.
>

(Plus two threads / siblings per CPU, right?)

I agree with you here. You know, we could rename struct sched_domain, add
a few fields to it and it becomes what you want. Its a _heirachical set_
of _sets of cpus sharing a certian property_ (underlining to aid grouping.

Uniform access to certian memory ranges could easily be one of these
properties. There is already some info about the amount of cache shared,
that also could be expanded on.

(Perhaps some exotic architecture  would like scheduling and memory a bit
more decoupled, but designing for *that* before hitting it would be over
engineering).

I'm not going to that because 2.6 doesn't need a generalised topology
because nothing makes use of it. Perhaps if something really good came up
in 2.7, there would be a case for backporting it. 2.6 does need improvements
to the scheduler though.

>
>>w26 does ALL this, while sched.o is 3K smaller than Ingo's shared runqueue
>>patch on NUMA and SMP, and 1K smaller on UP (although sched.c is 90 lines
>>longer). kernbench system time is down nearly 10% on the NUMAQ, so it isn't
>>hurting performance either.
>>
>
>Agreed, but Ingo's shared runqueue patch is poor implementation of a
>good idea: I've always disliked it.  I'm halfway through updating my
>patch, and I really think you'll like it better.  It's not
>incompatible with NUMA changes, in fact it's fairly non-invasive.
>

But if sched domains are accepted, there is no need for shared runqueues,
because as I said they can do anything sched domains can, so the code would
just be a redundant specialisation - unless you specifically wanted to share
locks & data with siblings.

I must admit I didn't look at your implementation. I look forward to it.
I'm not against shared runqueues. If my stuff doesn't get taken then of
course I'd rather shrq gets in than nothing at all, as I told Ingo. I
obviously just like sched domains better ;)



  reply	other threads:[~2003-12-13  6:43 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-08  4:25 [PATCH][RFC] make cpu_sibling_map a cpumask_t Nick Piggin
2003-12-08 15:59 ` Anton Blanchard
2003-12-08 23:08   ` Nick Piggin
2003-12-09  0:14     ` Anton Blanchard
2003-12-11  4:25       ` [CFT][RFC] HT scheduler Nick Piggin
2003-12-11  7:24         ` Nick Piggin
2003-12-11  8:57           ` Nick Piggin
2003-12-11 11:52             ` William Lee Irwin III
2003-12-11 13:09               ` Nick Piggin
2003-12-11 13:23                 ` William Lee Irwin III
2003-12-11 13:30                   ` Nick Piggin
2003-12-11 13:32                     ` William Lee Irwin III
2003-12-11 15:30                       ` Nick Piggin
2003-12-11 15:38                         ` William Lee Irwin III
2003-12-11 15:51                           ` Nick Piggin
2003-12-11 15:56                             ` William Lee Irwin III
2003-12-11 16:37                               ` Nick Piggin
2003-12-11 16:40                                 ` William Lee Irwin III
2003-12-12  1:52                         ` [PATCH] improve rwsem scalability (was Re: [CFT][RFC] HT scheduler) Nick Piggin
2003-12-12  2:02                           ` Nick Piggin
2003-12-12  9:41                           ` Ingo Molnar
2003-12-13  0:07                             ` Nick Piggin
2003-12-14  0:44                               ` Nick Piggin
2003-12-17  5:27                                 ` Nick Piggin
2003-12-19 11:52                                   ` Nick Piggin
2003-12-19 15:06                                     ` Martin J. Bligh
2003-12-20  0:08                                       ` Nick Piggin
2003-12-12  0:58             ` [CFT][RFC] HT scheduler Rusty Russell
2003-12-11 10:01           ` Rhino
2003-12-11  8:14             ` Nick Piggin
2003-12-11 16:49               ` Rhino
2003-12-11 15:16                 ` Nick Piggin
2003-12-11 11:40             ` William Lee Irwin III
2003-12-11 17:05               ` Rhino
2003-12-11 15:17                 ` William Lee Irwin III
2003-12-11 16:28         ` Kevin P. Fleming
2003-12-11 16:41           ` Nick Piggin
2003-12-12  2:24         ` Rusty Russell
2003-12-12  7:00           ` Nick Piggin
2003-12-12  7:23             ` Rusty Russell
2003-12-13  6:43               ` Nick Piggin [this message]
2003-12-14  1:35                 ` bill davidsen
2003-12-14  2:18                   ` Nick Piggin
2003-12-14  4:32                     ` Jamie Lokier
2003-12-14  9:40                       ` Nick Piggin
2003-12-14 10:46                         ` Arjan van de Ven
2003-12-16 17:46                         ` Bill Davidsen
2003-12-16 18:22                       ` Linus Torvalds
2003-12-17  0:24                         ` Davide Libenzi
2003-12-17  0:41                           ` Linus Torvalds
2003-12-17  0:54                             ` Davide Libenzi
2003-12-16 17:34                     ` Bill Davidsen
2003-12-15  5:53                 ` Rusty Russell
2003-12-15 23:08                   ` Nick Piggin
2003-12-19  4:57                     ` Nick Piggin
2003-12-19  5:13                       ` Nick Piggin
2003-12-20  2:43                       ` Rusty Russell
2003-12-21  2:56                         ` Nick Piggin
2004-01-03 18:57                   ` Bill Davidsen
2003-12-15 20:21                 ` Zwane Mwaikambo
2003-12-15 23:20                   ` Nick Piggin
2003-12-16  0:11                     ` Zwane Mwaikambo
2003-12-12  8:59             ` Nick Piggin
2003-12-12 15:14               ` Martin J. Bligh
2003-12-08 19:44 ` [PATCH][RFC] make cpu_sibling_map a cpumask_t James Cleverdon
2003-12-08 20:38 ` Ingo Molnar
2003-12-08 20:51 ` Zwane Mwaikambo
2003-12-08 20:55   ` Ingo Molnar
2003-12-08 23:17     ` Nick Piggin
2003-12-08 23:36       ` Ingo Molnar
2003-12-08 23:58         ` Nick Piggin
2003-12-08 23:46 ` Rusty Russell
2003-12-09 13:36   ` Nick Piggin
2003-12-11 21:41     ` bill davidsen
     [not found] <20031213022038.300B22C2C1@lists.samba.org.suse.lists.linux.kernel>
     [not found] ` <3FDAB517.4000309@cyberone.com.au.suse.lists.linux.kernel>
     [not found]   ` <brgeo7$huv$1@gatekeeper.tmr.com.suse.lists.linux.kernel>
     [not found]     ` <3FDBC876.3020603@cyberone.com.au.suse.lists.linux.kernel>
     [not found]       ` <20031214043245.GC21241@mail.shareable.org.suse.lists.linux.kernel>
     [not found]         ` <3FDC3023.9030708@cyberone.com.au.suse.lists.linux.kernel>
     [not found]           ` <1071398761.5233.1.camel@laptop.fenrus.com.suse.lists.linux.kernel>
2003-12-14 16:26             ` [CFT][RFC] HT scheduler Andi Kleen
2003-12-14 16:54               ` Arjan van de Ven
     [not found] <200312161127.13691.habanero@us.ibm.com>
2003-12-16 17:37 ` Andrew Theurer
2003-12-17  2:41   ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2003-12-16 19:03 Nakajima, Jun
2003-12-17  0:38 Nakajima, Jun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3FDAB517.4000309@cyberone.com.au \
    --to=piggin@cyberone.com.au \
    --cc=anton@samba.org \
    --cc=jun.nakajima@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markw@osdl.org \
    --cc=mbligh@aracnet.com \
    --cc=mingo@redhat.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox