From: Ray Bryant <raybry@sgi.com>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: Jesse Barnes <jbarnes@engr.sgi.com>,
Andrew Morton <akpm@osdl.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
steiner@sgi.com
Subject: Re: [PATCH] allocate page caches pages in round robin fasion
Date: Fri, 13 Aug 2004 11:33:54 -0500 [thread overview]
Message-ID: <411CED72.9090307@sgi.com> (raw)
In-Reply-To: <fa.g1i2d5e.1kgqq80@ifi.uio.no>
Dave Hansen wrote:
> On Thu, 2004-08-12 at 16:38, Jesse Barnes wrote:
>
>>On a NUMA machine, page cache pages should be spread out across the system
>>since they're generally global in nature and can eat up whole nodes worth of
>>memory otherwise. This can end up hurting performance since jobs will have
>>to make off node references for much or all of their non-file data.
>
>
> Wouldn't this be painful for any workload that accesses a unique set of
> files on each node? If an application knows that it is touching truly
> shared data which every node could possibly access, then they can use
> the NUMA API to cause round-robin allocations to occur.
>
I suppose it is possible for some workloads to be able to tell the difference
between a locally and globally allocated page cache page. It all depends on
the rate of access of data pages versus page cache pages.
For workloads that read in some data, then process that data for a very long
time (e. g. typical HPC workloads), it is more important to make sure those
data pages are allocated locally, and the page cache pages are touched much
less frequently, so making them globally round-robin'd is a marginal
performance hit. The problem we are trying to avoid here is to make sure the
node doesn't fill up with page cache pages, resulting in non-local allocations
for those data pages, which is not a good thing [tm].
On the other hand, if your workload spends most of its time writing buffered
file I/O to a set of pages that will comfotably fit on node, then it is
important to have the page cache pages allocated locally. So I can see the
need for some program control of placement.
However, using the NUMA API to cause round-robin allocations to occur would
use the process level policy, right? So the same decision will be made on how
to allocate data pages and page cache pages? Might it not be possible that an
application would like its page cache pages allocated globally round-robin,
but it still wants its data pages allocated via MPOL_DEFAULT?
Perhaps what is needed is the ability to associate a mem_policy with the page
cache allocation (or, perhaps, more generally, a default "kernel storage
allocation policy" for storage that the kernel allocates on behalf of a
process). System admins could set the default according to overall workload
considerations, and, perhaps, we would allow processes with sufficient
priviledge to set their own policy.
> Maybe a per-node watermark on page cache usage would be more useful.
> Once a node starts to get full, and it's past the watermark, we can go
> and shoot down some of the node's page cache. If the data access is
> truly global, then it has a good chance of being brought in on a
> different node.
I think this could be inefficient if the file access is truly global and the
file is large. (Think of a file that is significantly larger than the local
memory of any node.) Pages would be pulled into each node in turn as they are
accessed, then discarded as they go over the watermark, to be pulled in on
another node, etc. It would be better in this case just to round robin the
allocation on first access and be done with it.
>
> -- Dave
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next parent reply other threads:[~2004-08-13 16:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fa.hmbmqn2.d4ef9c@ifi.uio.no>
[not found] ` <fa.g1i2d5e.1kgqq80@ifi.uio.no>
2004-08-13 16:33 ` Ray Bryant [this message]
[not found] <fa.hmrqqf6.ckie1e@ifi.uio.no>
[not found] ` <fa.cg3cafa.ngi9og@ifi.uio.no>
2004-08-13 17:31 ` [PATCH] allocate page caches pages in round robin fasion Ray Bryant
[not found] <2sxuC-429-3@gated-at.bofh.it>
2004-08-13 1:14 ` Andi Kleen
2004-08-13 1:26 ` William Lee Irwin III
2004-08-13 1:29 ` Jesse Barnes
2004-08-13 16:04 ` Jesse Barnes
2004-08-13 17:31 ` Brent Casavant
2004-08-13 20:16 ` Andi Kleen
2004-08-12 23:46 Jesse Barnes
2004-08-13 0:13 ` William Lee Irwin III
2004-08-13 0:25 ` Jesse Barnes
2004-08-13 0:32 ` William Lee Irwin III
2004-08-13 14:50 ` Martin J. Bligh
2004-08-13 15:59 ` Jesse Barnes
2004-08-13 16:20 ` Martin J. Bligh
2004-08-13 16:34 ` Jesse Barnes
2004-08-13 16:47 ` Martin J. Bligh
2004-08-13 17:31 ` Nick Piggin
2004-08-13 21:16 ` Martin J. Bligh
2004-08-13 22:59 ` Martin J. Bligh
2004-08-14 1:21 ` Nick Piggin
-- strict thread matches above, loose matches on Subject: below --
2004-08-12 23:38 Jesse Barnes
2004-08-13 1:36 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=411CED72.9090307@sgi.com \
--to=raybry@sgi.com \
--cc=akpm@osdl.org \
--cc=haveblue@us.ibm.com \
--cc=jbarnes@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=steiner@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.