public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Erich Focht <efocht@hpce.nec.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Martin J. Bligh" <mbligh@aracnet.com>,
	Ingo Molnar <mingo@elte.hu>, Andi Kleen <ak@suse.de>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	Rick Lindsley <ricklind@us.ibm.com>,
	linux-kernel@vger.kernel.org, akpm@osdl.org, kernel@kolivas.org,
	rusty@rustcorp.com.au, anton@samba.org,
	lse-tech@lists.sourceforge.net
Subject: Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3
Date: Thu, 1 Apr 2004 00:23:05 +0200	[thread overview]
Message-ID: <200404010023.05642.efocht@hpce.nec.com> (raw)
In-Reply-To: <406A2819.6020009@yahoo.com.au>

On Wednesday 31 March 2004 04:08, Nick Piggin wrote:
> >>I'm with Martin here, we are just about to merge all this
> >>sched-domains stuff. So we should at least wait until after
> >>that. And of course, *nothing* gets changed without at least
> >>one benchmark that shows it improves something. So far
> >>nobody has come up to the plate with that.
> >
> > I thought you're talking the whole time about STREAM. That is THE
> > benchmark which shows you an impact of balancing at fork. At it is a
> > VERY relevant benchmark. Though you shouldn't run it on historical
> > machines like NUMAQ, no compute center in the western world will buy
> > NUMAQs for high performance... Andy typically runs STREAM on all CPUs
> > of a machine. Try on N/2 and N/4 and so on, you'll see the impact.
>
> Well yeah, but the immediate problem was that sched-domains was
> *much* worse than 2.6's numasched, neither of which balance on
> fork/clone. I didn't want to obscure the issue by implementing
> balance on fork/clone until we worked out exactly the problem.

I had the feeling that solving the performance issue reported by Andi
would ease the integration into the baseline...

> >>The point is that we have never had this before, and nobody
> >>(until now) has been asking for it. And there are as yet no
> >
> > ?? Sorry, I'm having balance at fork since 2001 in the NEC IA64 NUMA
> > kernels and users use it intensively with OpenMP. Advertised it a lot,
> > asked for it, atlked about it at the last OLS. Only IA64 was
> > considered rare big iron. I understand that the issue gets hotter if
> > the problem hurts on AMD64...
>
> Sorry I hadn't realised. I guess because you are happy with
> your own stuff you don't make too much noise about it on the
> list lately. I apologise.

The usual excuse: busy with other stuff...

> I wonder though, why don't you just teach OpenMP to use
> affinities as well? Surely that is better than relying on the
> behaviour of the scheduler, even if it does balance on clone.

You mean in the compiler? I don't think this is a good idea, that way you
loose flexibility in resource overcomitment. And performance when overselling
the machine's CPUs.

> > Again: I'm talking about having the choice. The user decides. Nothing
> > protects you against user stupidity, but if they just have the choice
> > of poor automatic initial scheduling, it's not enough. And: having the
> > fork/clone initial balancing policy means: you don't need to make your
> > code complicated and unportable by playing with setaffinity (which is
> > just plainly unusable when you share the machine with other users).
>
> If you do it by hand, you know exactly what is going to happen,
> and you can turn off the balance-on-clone flags and you don't
> incur the hit of pulling in remote cachelines from every CPU at
> clone time to do balancing. Surely an HPC application wouldn't
> mind doing that? (I guess they probably don't call clone a lot
> though).

OpenMP is implemented with clone. MPI parallel applications just exec,
they're fine. IMO the static affinity/cpumask handling should be done
externally by some resource manager which has a good overview on the
long-term load of the machine. It's a different issue, nothing for the
scheduler. I wouldn't leave it to the program, too unflexible and
unportable across machines and OSes.

> > There are companies mainly selling NUMA machines for HPC (SGI?), so
> > this is not a niche market. Clusters of big NUMA machines are not
> > unusual, and they're typically not used for databases but for HPC
> > apps. Unfortunately proprietary UNIX is still considered to have
> > better features than Linux for such configurations.
>
> Well, SGI should be doing tests soon and tuning the scheduler
> to their liking. Hopefully others will too, so we'll see what
> happens.

Maybe they are happy with their stuff, too. They have the cpumemsets
and some external affinity control, AFAIK.

> >>Let's just make sure we don't change defaults without any
> >>reason...
> >
> > No reason? Aaarghh...   >;-)
>
> Sorry I mean evidence. I'm sure with a properly tuned
> implementation, you could get really good speedups in lots
> of places... I just want to *see* them. All I have seen so
> far is Andi getting a bit better performance on something
> where he can get *much* better performance by making a
> trivial tweak instead.

I get the feeling that Andi's simple OpenMP job is already complex
enough to lead to wrong initial scheduling with the current aproach.
I suppose the reason are the 1-2 helper threads which are started
together with the worker threads (depending on the used compiler).
On small machines (and 4 cpus is small) they significantly disturb
the initial task distribution. For example with the Intel compiler
and 4 worker threads you get 6 tasks. The helper tasks are typically
runnable when the code starts so you get (in order of creation)
   CPU   Task   Role
    1     1     worker
    2     2     helper
    3     3     helper
    4     4     worker
    1-4   5     worker
    1-4   6     worker

So the difficulty is to find out which task will do real work and
which task is just spoiling the statistics. I think...

Regards,
Erich


  reply	other threads:[~2004-03-31 22:23 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-25 15:31 [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3 Nakajima, Jun
2004-03-25 15:40 ` Andi Kleen
2004-03-25 19:09   ` Ingo Molnar
2004-03-25 15:21     ` Andi Kleen
2004-03-25 19:39       ` Ingo Molnar
2004-03-25 20:30         ` Ingo Molnar
2004-03-29  8:45           ` Andi Kleen
2004-03-29 10:20             ` Rick Lindsley
2004-03-29  5:07               ` Andi Kleen
2004-03-29 11:28               ` Nick Piggin
2004-03-29 17:30                 ` Rick Lindsley
2004-03-30  0:01                   ` Nick Piggin
2004-03-30  1:26                     ` Rick Lindsley
2004-03-29 11:20             ` Nick Piggin
2004-03-29  6:01               ` Andi Kleen
2004-03-29 11:46                 ` Ingo Molnar
2004-03-29  7:03                   ` Andi Kleen
2004-03-29  7:10                     ` Andi Kleen
2004-03-29 20:14                   ` Andi Kleen
2004-03-29 23:51                     ` Nick Piggin
2004-03-30  6:34                       ` Andi Kleen
2004-03-30  6:40                         ` Ingo Molnar
2004-03-30  7:07                           ` Andi Kleen
2004-03-30  7:14                             ` Nick Piggin
2004-03-30  7:45                               ` Ingo Molnar
2004-03-30  7:58                                 ` Nick Piggin
2004-03-30  7:15                             ` Ingo Molnar
2004-03-30  7:18                               ` Nick Piggin
2004-03-30  7:48                               ` Andi Kleen
2004-03-30  8:18                                 ` Ingo Molnar
2004-03-30  9:36                                   ` Andi Kleen
2004-03-30  7:42                             ` Ingo Molnar
2004-03-30  7:03                         ` Nick Piggin
2004-03-30  7:13                           ` Andi Kleen
2004-03-30  7:24                             ` Nick Piggin
2004-03-30  7:38                             ` Arjan van de Ven
2004-03-30  7:13                           ` Martin J. Bligh
2004-03-30  7:31                             ` Nick Piggin
2004-03-30  7:38                               ` Martin J. Bligh
2004-03-30  8:05                               ` Ingo Molnar
2004-03-30  8:19                                 ` Nick Piggin
2004-03-30  8:45                                   ` Ingo Molnar
2004-03-30  8:53                                     ` Nick Piggin
2004-03-30 15:27                                       ` Martin J. Bligh
2004-03-25 19:24     ` Martin J. Bligh
2004-03-25 21:48       ` Ingo Molnar
2004-03-25 22:28         ` Martin J. Bligh
2004-03-29 22:30           ` Erich Focht
2004-03-30  9:05             ` Nick Piggin
2004-03-30 10:04               ` Erich Focht
2004-03-30 10:58                 ` Andi Kleen
2004-03-30 16:03                   ` [patch] sched-2.6.5-rc3-mm1-A0 Ingo Molnar
2004-03-31  2:30                     ` Nick Piggin
2004-03-30 11:02                 ` [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3 Andrew Morton
     [not found]                   ` <20040330161438.GA2257@elte.hu>
     [not found]                     ` <20040330161910.GA2860@elte.hu>
     [not found]                       ` <20040330162514.GA2943@elte.hu>
2004-03-30 21:03                         ` [patch] new-context balancing, 2.6.5-rc3-mm1 Ingo Molnar
2004-03-31  2:30                           ` Nick Piggin
2004-03-31 18:59                   ` [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3 Erich Focht
2004-03-31  2:08                 ` Nick Piggin
2004-03-31 22:23                   ` Erich Focht [this message]
2004-03-30 15:01             ` Martin J. Bligh
2004-03-31 21:23               ` Erich Focht
2004-03-31 21:33                 ` Martin J. Bligh
2004-03-25 21:59   ` Ingo Molnar
2004-03-25 22:26     ` Rick Lindsley
2004-03-25 22:30     ` Andrew Theurer
2004-03-25 22:38       ` Martin J. Bligh
2004-03-26  1:29       ` Andi Kleen
2004-03-26  3:23   ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2004-03-30 21:40 Nakajima, Jun
2004-03-30 22:15 ` Andrew Theurer
     [not found] <1DF7H-22Y-11@gated-at.bofh.it>
     [not found] ` <1DL3x-7iG-7@gated-at.bofh.it>
     [not found]   ` <1DLGd-7TS-17@gated-at.bofh.it>
     [not found]     ` <1FmNz-72J-73@gated-at.bofh.it>
     [not found]       ` <1FnzJ-7IW-15@gated-at.bofh.it>
2004-03-30  9:39         ` Andi Kleen
2004-03-31  1:56           ` Nick Piggin
2004-03-25 15:15 Nakajima, Jun
2004-03-25 16:19 ` John Hawkes
2004-03-25 16:53 ` Martin J. Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200404010023.05642.efocht@hpce.nec.com \
    --to=efocht@hpce.nec.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=anton@samba.org \
    --cc=jun.nakajima@intel.com \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mbligh@aracnet.com \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=ricklind@us.ibm.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox