Re: Avoid allocating during interleave from almost full nodes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@osdl.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Avoid allocating during interleave from almost full nodes
Date: Mon, 6 Nov 2006 11:59:25 -0800	[thread overview]
Message-ID: <20061106115925.1dd41a77.akpm@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0611060846070.25351@schroedinger.engr.sgi.com>

On Mon, 6 Nov 2006 08:53:22 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 3 Nov 2006, Andrew Morton wrote:
> 
> > This has almost nothing to do with elapsed time.
> > 
> > How about doing, in free_pages_bulk():
> > 
> > 	if (zone->over_interleave_pages) {
> > 		zone->over_interleave_pages = 0;
> > 		node_clear(zone_to_nid(zone), full_interleave_nodes);
> > 	}
> 
> Hmmm... We would also have to compare to the mininum pages 
> required before clearing the node.

OK.

> Isnt it a bit much to have two 
> comparisons added to the page free path?

Page freeing is not actually a fastpath.  It's rate-limited by the
frequency at which the CPU can _use_ the page: by filling it from disk, or
by writing to all of the page with the CPU.

Plus this is free_pages_bulk(), so the additional test occurs once per
per_cpu_pages.batch pages, not once per page.

And I assume it could be brought down to a single comparison with some
thought.

> > > It is needlessly expensive if its done for an allocation that is not bound 
> > > to a specific node and there are other nodes with free pages. We may throw 
> > > out pages that we need later.
> > 
> > Well it grossly changes the meaning of "interleaving".  We might as well
> > call it something else.  It's not necessarily worse, but it's not
> > interleaved any more.
> 
> It is going from node to node unless there is significant imbalance with 
> some nodes being over the limit and some under. Then the allocations will 
> take place round robin from the nodes under the limit until all are under 
> the limit. Then we continue going over all nodes again.

<head spins>

> > Actually by staying on the same node for a string of successive allocations
> > it could well be quicker.  How come MPOL_INTERLEAVE doesn't already do some
> > batching?   Or does it, and I missed it?
> 
> It should do interleaving because the data is to be accessed from multiple 
> nodes.

I think you missed the point.

At present the code does interleaving by taking one page from each zone and
then advancing onto the next zone, yes?

If so, this is pretty awful frmo a cache utilsiation POV.  it'd be much
better to take 16 pages from one zone before advancing onto the next one.

> Clustering on a single node may create hotspots or imbalances. 

Umm, but that's exactly what the patch we're discussing will do.

> Hmmm... We should check how many nodes are remaining if there is just a 
> single node left then we need to ignore the limit.

yup.

next prev parent reply	other threads:[~2006-11-06 19:59 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-03 20:58 Avoid allocating during interleave from almost full nodes Christoph Lameter
2006-11-03 21:46 ` Andrew Morton
2006-11-03 22:10   ` Christoph Lameter
2006-11-03 22:31     ` Andrew Morton
2006-11-04  0:28       ` Christoph Lameter
2006-11-04  0:58         ` Andrew Morton
2006-11-06 16:53           ` Christoph Lameter
2006-11-06 19:59             ` Andrew Morton [this message]
2006-11-06 20:12               ` Christoph Lameter
2006-11-06 20:24                 ` Andrew Morton
2006-11-06 20:31                   ` Christoph Lameter
2006-11-06 20:42                     ` Andrew Morton
2006-11-06 20:58                       ` Christoph Lameter
2006-11-06 21:20                         ` Andrew Morton
2006-11-06 21:42                           ` Christoph Lameter
2006-11-04  1:26       ` Paul Jackson
2006-11-04  1:42         ` Andrew Morton
2006-11-04 10:51           ` Paul Jackson
2006-11-06 16:56             ` Christoph Lameter
2006-11-08 10:21               ` Paul Jackson
2006-11-08 15:18                 ` Peter Zijlstra
2006-11-08 17:06                   ` Paul Jackson
2006-11-08 17:09                     ` Peter Zijlstra
2006-11-08 17:21                       ` Paul Jackson
2006-11-08 17:40                         ` Christoph Lameter
2006-12-01  7:51               ` Paul Jackson
2006-12-01  7:59                 ` Andrew Morton
2006-12-01 16:27                 ` Christoph Lameter
2006-11-04 10:35   ` CTL_UNNUMBERED and killing sys_sysctl Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061106115925.1dd41a77.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox