From: Ingo Molnar <mingo@elte.hu>
To: Andrew Morton <akpm@osdl.org>
Cc: Paul Jackson <pj@sgi.com>,
dgc@sgi.com, steiner@sgi.com, Simon.Derr@bull.net, ak@suse.de,
linux-kernel@vger.kernel.org, clameter@sgi.com
Subject: Re: [PATCH 1/5] cpuset memory spread basic implementation
Date: Mon, 6 Feb 2006 07:02:43 +0100 [thread overview]
Message-ID: <20060206060243.GA11918@elte.hu> (raw)
In-Reply-To: <20060205203711.2c855971.akpm@osdl.org>
* Andrew Morton <akpm@osdl.org> wrote:
> Paul Jackson <pj@sgi.com> wrote:
> >
> > This policy can provide substantial improvements for jobs that
> > need to place thread local data on the corresponding node, but
> > that need to access large file system data sets that need to
> > be spread across the several nodes in the jobs cpuset in order
> > to fit. Without this patch, especially for jobs that might
> > have one thread reading in the data set, the memory allocation
> > across the nodes in the jobs cpuset can become very uneven.
>
>
> It all seems rather ironic. We do vast amounts of development to make
> certain microbenchmarks look good, then run a real workload on the
> thing, find that all those microbenchmark-inspired tweaks actually
> deoptimised the real workload? So now we need to add per-task knobs
> to turn off the previously-added microbenchmark-tweaks.
>
> What happens if one process does lots of filesystem activity and
> another one (concurrent or subsequent) wants lots of thread-local
> storage? Won't the same thing happen?
>
> IOW: this patch seems to be a highly specific bandaid which is
> repairing an ill-advised problem of our own making, does it not?
i suspect it all depends whether the workload is 'global' or 'local'.
Lets consider the hypothetical case of a 16-node box with 64 CPUs and 1
TB of RAM, which could have two fundamental types of workloads:
- lots of per-CPU tasks which are highly independent and each does its
own stuff. For this case we really want to allocate everything per-CPU
and as close to the task as possible.
- 90% of the 1 TB of RAM is in a shared 'database' that is accessed by
all nodes in a nonpredictable pattern, from any CPU. For this case we
want to 'spread out' the pagecache as much as possible. If we dont
spread it out then one task - e.g. an initialization process - could
create a really bad distribution for the pagecache: big continuous
ranges allocated on the same node. If the workload has randomness but
also occasional "medium range" locality, an uneven portion of the
accesses could go to the same node, hosting some big continuous chunk
of the database. So we want to spread out in an as finegrained way as
possible. (perhaps not too finegrained though, to let block IO still
be reasonably batched.)
we normally optimize for the first case, and it works pretty well on
both SMP and NUMA. We do pretty well with the second workload on SMP,
but on NUMA, the non-spreadig can hurt. So it makes sense to
artificially 'interleave' all the RAM that goes into the pagecache, to
have a good statistical distribution of pages.
neither workload is broken, nor did we do any design mistake to optimize
the SMP case for the first one - it is really the common thing on most
boxes. But the second workload does happen too, and it conflicts with
the first workload's needs. The difference between the workloads cannot
be bridged by the kernel: it is two very different access patterns that
results from the problem the application is trying to solve - the kernel
cannot influence that.
I suspect there is no other way but to let the application tell the
kernel which strategy it wants to be utilized.
Ingo
next prev parent reply other threads:[~2006-02-06 6:04 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-04 7:19 [PATCH 1/5] cpuset memory spread basic implementation Paul Jackson
2006-02-04 7:19 ` [PATCH 2/5] cpuset memory spread page cache implementation and hooks Paul Jackson
2006-02-04 23:49 ` Andrew Morton
2006-02-05 1:42 ` Paul Jackson
2006-02-05 1:54 ` Andrew Morton
2006-02-05 3:28 ` Christoph Lameter
2006-02-05 5:06 ` Andrew Morton
2006-02-05 6:08 ` Paul Jackson
2006-02-05 6:15 ` Andrew Morton
2006-02-05 6:28 ` Paul Jackson
2006-02-06 0:20 ` Paul Jackson
2006-02-06 5:51 ` Paul Jackson
2006-02-06 7:14 ` Pekka J Enberg
2006-02-06 7:42 ` Pekka J Enberg
2006-02-06 7:51 ` Pekka J Enberg
2006-02-06 17:32 ` Pekka Enberg
2006-02-04 7:19 ` [PATCH 3/5] cpuset memory spread slab cache implementation Paul Jackson
2006-02-04 23:49 ` Andrew Morton
2006-02-05 3:37 ` Christoph Lameter
2006-02-04 7:19 ` [PATCH 4/5] cpuset memory spread slab cache optimizations Paul Jackson
2006-02-04 23:50 ` Andrew Morton
2006-02-05 3:18 ` Paul Jackson
2006-02-04 23:50 ` Andrew Morton
2006-02-05 4:10 ` Paul Jackson
2006-02-04 7:19 ` [PATCH 5/5] cpuset memory spread slab cache hooks Paul Jackson
2006-02-06 4:37 ` Andrew Morton
2006-02-04 23:49 ` [PATCH 1/5] cpuset memory spread basic implementation Andrew Morton
2006-02-05 3:35 ` Christoph Lameter
2006-02-06 4:33 ` Andrew Morton
2006-02-06 5:50 ` Paul Jackson
2006-02-06 6:02 ` Andrew Morton
2006-02-06 6:17 ` Ingo Molnar
2006-02-06 7:22 ` Paul Jackson
2006-02-06 7:43 ` Ingo Molnar
2006-02-06 8:19 ` Paul Jackson
2006-02-06 8:22 ` Ingo Molnar
2006-02-06 8:40 ` Ingo Molnar
2006-02-06 9:03 ` Paul Jackson
2006-02-06 9:09 ` Ingo Molnar
2006-02-06 9:27 ` Paul Jackson
2006-02-06 9:37 ` Ingo Molnar
2006-02-06 20:22 ` Paul Jackson
2006-02-06 8:47 ` Paul Jackson
2006-02-06 8:51 ` Ingo Molnar
2006-02-06 9:09 ` Paul Jackson
2006-02-06 10:09 ` Andi Kleen
2006-02-06 10:11 ` Ingo Molnar
2006-02-06 10:16 ` Andi Kleen
2006-02-06 10:23 ` Ingo Molnar
2006-02-06 10:35 ` Andi Kleen
2006-02-06 14:42 ` Paul Jackson
2006-02-06 14:35 ` Paul Jackson
2006-02-06 16:48 ` Christoph Lameter
2006-02-06 17:11 ` Andi Kleen
2006-02-06 18:21 ` Christoph Lameter
2006-02-06 18:36 ` Andi Kleen
2006-02-06 18:43 ` Christoph Lameter
2006-02-06 18:48 ` Andi Kleen
2006-02-06 19:19 ` Christoph Lameter
2006-02-06 20:27 ` Paul Jackson
2006-02-06 18:43 ` Ingo Molnar
2006-02-06 20:01 ` Paul Jackson
2006-02-06 20:05 ` Ingo Molnar
2006-02-06 20:27 ` Christoph Lameter
2006-02-06 20:41 ` Ingo Molnar
2006-02-06 20:49 ` Christoph Lameter
2006-02-06 21:07 ` Ingo Molnar
2006-02-06 22:10 ` Christoph Lameter
2006-02-06 23:29 ` Ingo Molnar
2006-02-06 23:45 ` Paul Jackson
2006-02-07 0:19 ` Ingo Molnar
2006-02-07 1:17 ` David Chinner
2006-02-07 9:31 ` Andi Kleen
2006-02-07 11:53 ` Ingo Molnar
2006-02-07 12:14 ` Andi Kleen
2006-02-07 12:30 ` Ingo Molnar
2006-02-07 12:43 ` Andi Kleen
2006-02-07 12:58 ` Ingo Molnar
2006-02-07 13:14 ` Andi Kleen
2006-02-07 14:11 ` Ingo Molnar
2006-02-07 14:23 ` Andi Kleen
2006-02-07 17:11 ` Christoph Lameter
2006-02-07 17:29 ` Andi Kleen
2006-02-07 17:39 ` Christoph Lameter
2006-02-07 17:10 ` Christoph Lameter
2006-02-07 17:28 ` Andi Kleen
2006-02-07 17:42 ` Christoph Lameter
2006-02-07 17:51 ` Andi Kleen
2006-02-07 17:06 ` Christoph Lameter
2006-02-07 17:26 ` Andi Kleen
2006-02-04 23:50 ` Andrew Morton
2006-02-04 23:57 ` David S. Miller
2006-02-06 4:37 ` Andrew Morton
2006-02-06 6:02 ` Ingo Molnar [this message]
2006-02-06 6:56 ` Paul Jackson
2006-02-06 7:08 ` Andrew Morton
2006-02-06 7:39 ` Ingo Molnar
2006-02-06 8:22 ` Paul Jackson
2006-02-06 8:35 ` Paul Jackson
2006-02-06 9:32 ` Paul Jackson
2006-02-06 9:57 ` Andrew Morton
2006-02-06 9:18 ` Simon Derr
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060206060243.GA11918@elte.hu \
--to=mingo@elte.hu \
--cc=Simon.Derr@bull.net \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=clameter@sgi.com \
--cc=dgc@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pj@sgi.com \
--cc=steiner@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox