Re: [PATCH 1/5] cpuset memory spread basic implementation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Andrew Morton <akpm@osdl.org>
Cc: Paul Jackson <pj@sgi.com>,
	dgc@sgi.com, steiner@sgi.com, Simon.Derr@bull.net, ak@suse.de,
	linux-kernel@vger.kernel.org, clameter@sgi.com
Subject: Re: [PATCH 1/5] cpuset memory spread basic implementation
Date: Mon, 6 Feb 2006 07:02:43 +0100	[thread overview]
Message-ID: <20060206060243.GA11918@elte.hu> (raw)
In-Reply-To: <20060205203711.2c855971.akpm@osdl.org>


* Andrew Morton <akpm@osdl.org> wrote:

> Paul Jackson <pj@sgi.com> wrote:
> >
> > This policy can provide substantial improvements for jobs that
> >  need to place thread local data on the corresponding node, but
> >  that need to access large file system data sets that need to
> >  be spread across the several nodes in the jobs cpuset in order
> >  to fit.  Without this patch, especially for jobs that might
> >  have one thread reading in the data set, the memory allocation
> >  across the nodes in the jobs cpuset can become very uneven.
> 
> 
> It all seems rather ironic.  We do vast amounts of development to make 
> certain microbenchmarks look good, then run a real workload on the 
> thing, find that all those microbenchmark-inspired tweaks actually 
> deoptimised the real workload?  So now we need to add per-task knobs 
> to turn off the previously-added microbenchmark-tweaks.
> 
> What happens if one process does lots of filesystem activity and 
> another one (concurrent or subsequent) wants lots of thread-local 
> storage?  Won't the same thing happen?
> 
> IOW: this patch seems to be a highly specific bandaid which is 
> repairing an ill-advised problem of our own making, does it not?

i suspect it all depends whether the workload is 'global' or 'local'.  
Lets consider the hypothetical case of a 16-node box with 64 CPUs and 1 
TB of RAM, which could have two fundamental types of workloads:

- lots of per-CPU tasks which are highly independent and each does its
  own stuff. For this case we really want to allocate everything per-CPU
  and as close to the task as possible.

- 90% of the 1 TB of RAM is in a shared 'database' that is accessed by 
  all nodes in a nonpredictable pattern, from any CPU. For this case we 
  want to 'spread out' the pagecache as much as possible. If we dont 
  spread it out then one task - e.g. an initialization process - could 
  create a really bad distribution for the pagecache: big continuous 
  ranges allocated on the same node. If the workload has randomness but 
  also occasional "medium range" locality, an uneven portion of the 
  accesses could go to the same node, hosting some big continuous chunk 
  of the database. So we want to spread out in an as finegrained way as 
  possible. (perhaps not too finegrained though, to let block IO still 
  be reasonably batched.)

we normally optimize for the first case, and it works pretty well on 
both SMP and NUMA. We do pretty well with the second workload on SMP, 
but on NUMA, the non-spreadig can hurt. So it makes sense to 
artificially 'interleave' all the RAM that goes into the pagecache, to 
have a good statistical distribution of pages.

neither workload is broken, nor did we do any design mistake to optimize 
the SMP case for the first one - it is really the common thing on most 
boxes. But the second workload does happen too, and it conflicts with 
the first workload's needs. The difference between the workloads cannot 
be bridged by the kernel: it is two very different access patterns that 
results from the problem the application is trying to solve - the kernel 
cannot influence that.

I suspect there is no other way but to let the application tell the 
kernel which strategy it wants to be utilized.

	Ingo

next prev parent reply	other threads:[~2006-02-06  6:04 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-04  7:19 [PATCH 1/5] cpuset memory spread basic implementation Paul Jackson
2006-02-04  7:19 ` [PATCH 2/5] cpuset memory spread page cache implementation and hooks Paul Jackson
2006-02-04 23:49   ` Andrew Morton
2006-02-05  1:42     ` Paul Jackson
2006-02-05  1:54       ` Andrew Morton
2006-02-05  3:28         ` Christoph Lameter
2006-02-05  5:06           ` Andrew Morton
2006-02-05  6:08             ` Paul Jackson
2006-02-05  6:15               ` Andrew Morton
2006-02-05  6:28                 ` Paul Jackson
2006-02-06  0:20                 ` Paul Jackson
2006-02-06  5:51                 ` Paul Jackson
2006-02-06  7:14                   ` Pekka J Enberg
2006-02-06  7:42                     ` Pekka J Enberg
2006-02-06  7:51                       ` Pekka J Enberg
2006-02-06 17:32                         ` Pekka Enberg
2006-02-04  7:19 ` [PATCH 3/5] cpuset memory spread slab cache implementation Paul Jackson
2006-02-04 23:49   ` Andrew Morton
2006-02-05  3:37     ` Christoph Lameter
2006-02-04  7:19 ` [PATCH 4/5] cpuset memory spread slab cache optimizations Paul Jackson
2006-02-04 23:50   ` Andrew Morton
2006-02-05  3:18     ` Paul Jackson
2006-02-04 23:50   ` Andrew Morton
2006-02-05  4:10     ` Paul Jackson
2006-02-04  7:19 ` [PATCH 5/5] cpuset memory spread slab cache hooks Paul Jackson
2006-02-06  4:37   ` Andrew Morton
2006-02-04 23:49 ` [PATCH 1/5] cpuset memory spread basic implementation Andrew Morton
2006-02-05  3:35   ` Christoph Lameter
2006-02-06  4:33   ` Andrew Morton
2006-02-06  5:50     ` Paul Jackson
2006-02-06  6:02       ` Andrew Morton
2006-02-06  6:17         ` Ingo Molnar
2006-02-06  7:22           ` Paul Jackson
2006-02-06  7:43             ` Ingo Molnar
2006-02-06  8:19               ` Paul Jackson
2006-02-06  8:22                 ` Ingo Molnar
2006-02-06  8:40                   ` Ingo Molnar
2006-02-06  9:03                     ` Paul Jackson
2006-02-06  9:09                       ` Ingo Molnar
2006-02-06  9:27                         ` Paul Jackson
2006-02-06  9:37                           ` Ingo Molnar
2006-02-06 20:22                     ` Paul Jackson
2006-02-06  8:47                   ` Paul Jackson
2006-02-06  8:51                     ` Ingo Molnar
2006-02-06  9:09                       ` Paul Jackson
2006-02-06 10:09                   ` Andi Kleen
2006-02-06 10:11                     ` Ingo Molnar
2006-02-06 10:16                       ` Andi Kleen
2006-02-06 10:23                         ` Ingo Molnar
2006-02-06 10:35                           ` Andi Kleen
2006-02-06 14:42                           ` Paul Jackson
2006-02-06 14:35                         ` Paul Jackson
2006-02-06 16:48                           ` Christoph Lameter
2006-02-06 17:11                             ` Andi Kleen
2006-02-06 18:21                               ` Christoph Lameter
2006-02-06 18:36                                 ` Andi Kleen
2006-02-06 18:43                                   ` Christoph Lameter
2006-02-06 18:48                                     ` Andi Kleen
2006-02-06 19:19                                       ` Christoph Lameter
2006-02-06 20:27                                       ` Paul Jackson
2006-02-06 18:43                                   ` Ingo Molnar
2006-02-06 20:01                                     ` Paul Jackson
2006-02-06 20:05                                       ` Ingo Molnar
2006-02-06 20:27                                         ` Christoph Lameter
2006-02-06 20:41                                           ` Ingo Molnar
2006-02-06 20:49                                             ` Christoph Lameter
2006-02-06 21:07                                               ` Ingo Molnar
2006-02-06 22:10                                                 ` Christoph Lameter
2006-02-06 23:29                                                   ` Ingo Molnar
2006-02-06 23:45                                         ` Paul Jackson
2006-02-07  0:19                                           ` Ingo Molnar
2006-02-07  1:17                                             ` David Chinner
2006-02-07  9:31                                             ` Andi Kleen
2006-02-07 11:53                                               ` Ingo Molnar
2006-02-07 12:14                                                 ` Andi Kleen
2006-02-07 12:30                                                   ` Ingo Molnar
2006-02-07 12:43                                                     ` Andi Kleen
2006-02-07 12:58                                                       ` Ingo Molnar
2006-02-07 13:14                                                         ` Andi Kleen
2006-02-07 14:11                                                           ` Ingo Molnar
2006-02-07 14:23                                                             ` Andi Kleen
2006-02-07 17:11                                                               ` Christoph Lameter
2006-02-07 17:29                                                                 ` Andi Kleen
2006-02-07 17:39                                                                   ` Christoph Lameter
2006-02-07 17:10                                                       ` Christoph Lameter
2006-02-07 17:28                                                         ` Andi Kleen
2006-02-07 17:42                                                           ` Christoph Lameter
2006-02-07 17:51                                                             ` Andi Kleen
2006-02-07 17:06                                                     ` Christoph Lameter
2006-02-07 17:26                                                       ` Andi Kleen
2006-02-04 23:50 ` Andrew Morton
2006-02-04 23:57   ` David S. Miller
2006-02-06  4:37 ` Andrew Morton
2006-02-06  6:02   ` Ingo Molnar [this message]
2006-02-06  6:56   ` Paul Jackson
2006-02-06  7:08     ` Andrew Morton
2006-02-06  7:39       ` Ingo Molnar
2006-02-06  8:22         ` Paul Jackson
2006-02-06  8:35         ` Paul Jackson
2006-02-06  9:32       ` Paul Jackson
2006-02-06  9:57         ` Andrew Morton
2006-02-06  9:18 ` Simon Derr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060206060243.GA11918@elte.hu \
    --to=mingo@elte.hu \
    --cc=Simon.Derr@bull.net \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=dgc@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pj@sgi.com \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox