Re: [PATCH] allocate page caches pages in round robin fasion

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ray Bryant <raybry@sgi.com>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: Jesse Barnes <jbarnes@engr.sgi.com>,
	Andrew Morton <akpm@osdl.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	steiner@sgi.com
Subject: Re: [PATCH] allocate page caches pages in round robin fasion
Date: Fri, 13 Aug 2004 11:33:54 -0500	[thread overview]
Message-ID: <411CED72.9090307@sgi.com> (raw)
In-Reply-To: <fa.g1i2d5e.1kgqq80@ifi.uio.no>

Dave Hansen wrote:
> On Thu, 2004-08-12 at 16:38, Jesse Barnes wrote:
> 
>>On a NUMA machine, page cache pages should be spread out across the system 
>>since they're generally global in nature and can eat up whole nodes worth of 
>>memory otherwise.  This can end up hurting performance since jobs will have 
>>to make off node references for much or all of their non-file data.
> 
> 
> Wouldn't this be painful for any workload that accesses a unique set of
> files on each node?  If an application knows that it is touching truly
> shared data which every node could possibly access, then they can use
> the NUMA API to cause round-robin allocations to occur.  
> 

I suppose it is possible for some workloads to be able to tell the difference 
between a locally and globally allocated page cache page.  It all depends on 
the rate of access of data pages versus page cache pages.

For workloads that read in some data, then process that data for a very long 
time (e. g. typical HPC workloads), it is more important to make sure those 
data pages are allocated locally, and the page cache pages are touched much 
less frequently, so making them globally round-robin'd is a marginal 
performance hit.  The problem we are trying to avoid here is to make sure the 
node doesn't fill up with page cache pages, resulting in non-local allocations 
for those data pages, which is not a good thing [tm].

On the other hand, if your workload spends most of its time writing buffered 
file I/O to a set of pages that will comfotably fit on node, then it is 
important to have the page cache pages allocated locally.  So I can see the 
need for some program control of placement.

However, using the NUMA API to cause round-robin allocations to occur would 
use the process level policy, right?  So the same decision will be made on how 
to allocate data pages and page cache pages?  Might it not be possible that an 
application would like its page cache pages allocated globally round-robin, 
but it still wants its data pages allocated via MPOL_DEFAULT?

Perhaps what is needed is the ability to associate a mem_policy with the page 
cache allocation (or, perhaps, more generally, a default "kernel storage 
allocation policy" for storage that the kernel allocates on behalf of a 
process).  System admins could set the default according to overall workload 
considerations, and, perhaps, we would allow processes with sufficient 
priviledge to set their own policy.

> Maybe a per-node watermark on page cache usage would be more useful. 
> Once a node starts to get full, and it's past the watermark, we can go
> and shoot down some of the node's page cache.  If the data access is
> truly global, then it has a good chance of being brought in on a
> different node.

I think this could be inefficient if the file access is truly global and the 
file is large.  (Think of a file that is significantly larger than the local 
memory of any node.)  Pages would be pulled into each node in turn as they are 
accessed, then discarded as they go over the watermark, to be pulled in on 
another node, etc.  It would be better in this case just to round robin the 
allocation on first access and be done with it.

> 
> -- Dave
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

next      parent reply	other threads:[~2004-08-13 16:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fa.hmbmqn2.d4ef9c@ifi.uio.no>
     [not found] ` <fa.g1i2d5e.1kgqq80@ifi.uio.no>
2004-08-13 16:33   ` Ray Bryant [this message]
     [not found] <fa.hmrqqf6.ckie1e@ifi.uio.no>
     [not found] ` <fa.cg3cafa.ngi9og@ifi.uio.no>
2004-08-13 17:31   ` [PATCH] allocate page caches pages in round robin fasion Ray Bryant
     [not found] <2sxuC-429-3@gated-at.bofh.it>
2004-08-13  1:14 ` Andi Kleen
2004-08-13  1:26   ` William Lee Irwin III
2004-08-13  1:29   ` Jesse Barnes
2004-08-13 16:04   ` Jesse Barnes
2004-08-13 17:31     ` Brent Casavant
2004-08-13 20:16       ` Andi Kleen
2004-08-12 23:46 Jesse Barnes
2004-08-13  0:13 ` William Lee Irwin III
2004-08-13  0:25   ` Jesse Barnes
2004-08-13  0:32     ` William Lee Irwin III
2004-08-13 14:50 ` Martin J. Bligh
2004-08-13 15:59   ` Jesse Barnes
2004-08-13 16:20     ` Martin J. Bligh
2004-08-13 16:34       ` Jesse Barnes
2004-08-13 16:47         ` Martin J. Bligh
2004-08-13 17:31           ` Nick Piggin
2004-08-13 21:16             ` Martin J. Bligh
2004-08-13 22:59               ` Martin J. Bligh
2004-08-14  1:21               ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2004-08-12 23:38 Jesse Barnes
2004-08-13  1:36 ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=411CED72.9090307@sgi.com \
    --to=raybry@sgi.com \
    --cc=akpm@osdl.org \
    --cc=haveblue@us.ibm.com \
    --cc=jbarnes@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.