public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ray Bryant <raybry@sgi.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Martin Hicks <mort@wildopensource.com>,
	pj@sgi.com, linux-kernel@vger.kernel.org,
	Martin Hilgeman <hilgeman@sgi.com>
Subject: Re: [PATCH/RFC] A method for clearing out page cache
Date: Mon, 21 Feb 2005 16:28:11 -0600	[thread overview]
Message-ID: <421A607B.4050606@sgi.com> (raw)
In-Reply-To: <20050221134220.2f5911c9.akpm@osdl.org>

Andrew Morton wrote:
> Martin Hicks <mort@wildopensource.com> wrote:
> 
>>This patch introduces a new sysctl for NUMA systems that tries to drop
>> as much of the page cache as possible from a set of nodes.  The
>> motivation for this patch is for setting up High Performance Computing
>> jobs, where initial memory placement is very important to overall
>> performance.
> 
> 
> - Using a write to /proc for this seems a bit hacky.  Why not simply add
>   a new system call for it?
> 

We did it this way because it was easier to get it into SLES9 that way.
But there is no particular reason that we couldn't use a system call.
It's just that we figured adding system calls is hard.

> - Starting a kernel thread for each node might be overkill.  Yes, it
>   would take longer if one process was to do all the work, but does this
>   operation need to be very fast?
>

It is possible that this call might need to be executed at the start of
each batch job in the system.  The reason for using a kernel thread was
that there was no good way to start concurrency due to a write to /proc.

>   If it does, then userspace could arrange for that concurrency by
>   starting a number of processes to perform the toss, each with a different
>   nodemask.
> 

That works fine as well if we can get a system call number assigned and
avoids the hackiness of both /proc and the kernel threads.

> - Dropping "as much pagecache as possible" might be a bit crude.  I
>   wonder if we should pass in some additional parameter which specifies how
>   much of the node's pagecache should be removed.
> 
>   Or, better, specify how much free memory we will actually require on
>   this node.  The syscall terminates when it determines that enough
>   pagecache has been removed.

Our thoughts exactly.  This is clearly a "big hammer" and we want to
make a lighter hammer to free up a certain number of pages.  Indeed,
we would like to have these calls occur automatically from __alloc_pages()
when we try to allocate local storage and find that there isn't any.
For our workloads, we want to free up unmapped, clean pagecache, if that
is what is keeping us from allocating a local page.  Not all workloads
want that, however, so we would probably use a sysctl() to enable/disable
this.

However, the first step is to do this manually from user space.

> 
> - To make the syscall more general, we should be able to reclaim mapped
>   pagecache and anonymous memory as well.
> 
> 
> So what it comes down to is
> 
> sys_free_node_memory(long node_id, long pages_to_make_free, long what_to_free)
> 
> where `what_to_free' consists of a bunch of bitflags (unmapped pagecache,
> mapped pagecache, anonymous memory, slab, ...).

Do we have to implement all of those or just allow for the possibility of that
being implemented in the future?  E. g. in our case we'd just implement the
bit that says "unmapped pagecache".

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------

  parent reply	other threads:[~2005-02-21 22:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-14 15:44 [PATCH/RFC] A method for clearing out page cache Martin Hicks
2005-02-15  3:37 ` Paul Jackson
2005-02-16 19:56   ` Martin Hicks
2005-02-21 19:27   ` Martin Hicks
2005-02-21 21:42     ` Andrew Morton
2005-02-21 22:12       ` Paul Jackson
2005-02-21 22:28         ` Andrew Morton
2005-02-21 23:52           ` Paul Jackson
2005-02-21 22:28       ` Ray Bryant [this message]
2005-02-21 22:41         ` Andrew Morton
2005-02-21 23:01           ` Ray Bryant
2005-02-22  7:53           ` Ingo Molnar
2005-02-22  8:07             ` Andrew Morton
2005-02-22  8:24               ` Ingo Molnar
2005-02-22 17:29                 ` Ray Bryant
2005-02-22 11:26             ` Paul Jackson
2005-02-22 18:45               ` Andrew Morton
2005-02-22 18:59                 ` Martin Hicks
2005-02-22 19:03                 ` Ray Bryant
2005-02-23  0:14                 ` Paul Jackson
2005-03-01 21:54       ` Pavel Machek
2005-02-21 21:52     ` Nish Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=421A607B.4050606@sgi.com \
    --to=raybry@sgi.com \
    --cc=akpm@osdl.org \
    --cc=hilgeman@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mort@wildopensource.com \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox