From: Jes Sorensen <jes@sgi.com>
To: Andi Kleen <ak@suse.de>
Cc: Tony Luck <tony.luck@intel.com>,
discuss@x86-64.org, linux-kernel@vger.kernel.org,
libc-alpha@sourceware.org, vojtech@suse.cz,
linux-ia64@vger.kernel.org
Subject: Re: FOR REVIEW: New x86-64 vsyscall vgetcpu()
Date: Fri, 16 Jun 2006 09:48:03 +0000 [thread overview]
Message-ID: <yq0ejxp2zzg.fsf@jaguar.mkp.net> (raw)
In-Reply-To: <200606160822.23898.ak@suse.de>
>>>>> "Andi" = Andi Kleen <ak@suse.de> writes:
Andi> On Thursday 15 June 2006 20:44, Tony Luck wrote:
>> Another alternative would be to provide a mechanism for a process
>> to bind to the current cpu (whatever cpu that happens to be). Then
>> the kernel gets to make the smart placement decisions, and
>> processes that want to be bound somewhere (but don't really care
>> exactly where) have a way to meet their need. Perhaps a cpumask of
>> all zeroes to a sched_setaffinity call could be overloaded for
>> this?
Andi> I tried something like this a few years ago and it just didn't
Andi> work (or rather ran usually slower) The scheduler would select a
Andi> home node at startup and then try to move the process there.
Andi> The problem is that not using a CPU costs you much more than
Andi> whatever overhead you get from using non local memory.
It all depends on your application and the type of system you are
running on. What you say applies to smaller cpu counts. However once
we see the upcoming larger count multi-core cpus become commonly
available, this is likely to change and become more like what is seen
today on larger NUMA systems.
In the scientific application space, there are two very common
groupings of jobs. One is simply a large threaded application with a
lot of intercommunication, often via MPI. In many cases one ends up
running a job on just a subset of the system, in which case you want
to see threads placed on the same node(s) to minimize internode
communication. It is desirable to either force the other tasks on the
system (system daemons etc) onto other node(s) to reduce noise and
there could also be space to run another parallel job on the remaining
node(s).
The other common case is to have jobs which spawn off a number of
threads that work together in groups (via OpenMP). In this case you
would like to have all your OpenMP threads placed on the same node for
similar reasons.
Not getting this right can result in significant loss of performance
for jobs which are highly memory bound or rely heavily on
intercommunication and synchronization.
Andi> So by default filling the CPUs must be the highest priority and
Andi> memory policy cannot interfere with that.
I really don't think this approach is going to solve the problem. As
Tony also points out, tasks will eventually migrate. The user needs to
tell the kernel where it wants to run the tasks rather than the kernel
telling the task where it is located. Only the application (or
developer/user) knows how the threads are expected to behave, doing
this automatically is almost never going to be optimal. Obviously the
user needs visibility of the topology of the machine to do so but that
should be available on any NUMA system through /proc or /sys.
In the scientific space the jobs are often run repeatedly with new
data sets every time, so it is worthwhile to spend the effort up front
to get the placement right. One-off runs are obviously something else
and there your method is going to be more beneficial.
IMHO, what we really need is a more advanced way for user applications
to hint at the kernel how to place it's threads.
Cheers,
Jes
next parent reply other threads:[~2006-06-16 9:48 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200606140942.31150.ak@suse.de>
[not found] ` <12c511ca0606151144i140c21e5w90dd948af9b536a4@mail.gmail.com>
[not found] ` <200606160822.23898.ak@suse.de>
2006-06-16 9:48 ` Jes Sorensen [this message]
2006-06-16 10:09 ` FOR REVIEW: New x86-64 vsyscall vgetcpu() Andi Kleen
2006-06-16 11:02 ` Jes Sorensen
2006-06-16 11:17 ` Andi Kleen
2006-06-16 11:58 ` Jes Sorensen
2006-06-16 12:36 ` Zoltan Menyhart
2006-06-16 12:41 ` Jes Sorensen
2006-06-16 12:48 ` Zoltan Menyhart
2006-06-16 21:04 ` Chase Venters
2006-06-16 14:56 ` Andi Kleen
2006-06-16 15:31 ` Zoltan Menyhart
2006-06-16 15:37 ` Andi Kleen
2006-06-16 15:58 ` Jakub Jelinek
2006-06-16 16:24 ` Andi Kleen
2006-06-16 16:33 ` Jakub Jelinek
2006-06-16 21:12 ` Chase Venters
2006-06-16 15:36 ` Brent Casavant
2006-06-16 15:40 ` Andi Kleen
2006-06-16 21:15 ` Chase Venters
2006-06-16 21:19 ` Chase Venters
2006-06-16 23:40 ` Brent Casavant
2006-06-17 6:58 ` Andi Kleen
2006-06-17 6:55 ` [discuss] " Andi Kleen
2006-06-19 8:42 ` Zoltan Menyhart
2006-06-19 8:54 ` Andi Kleen
2006-06-16 14:54 ` Andi Kleen
2006-06-20 8:28 ` Jes Sorensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq0ejxp2zzg.fsf@jaguar.mkp.net \
--to=jes@sgi.com \
--cc=ak@suse.de \
--cc=discuss@x86-64.org \
--cc=libc-alpha@sourceware.org \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=vojtech@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox